Cheng
0bb89e9e5f
Share more common code in Compiled ( #2240 )
...
* Share more common code in Compiled
* Remove build_lib_name
2025-06-03 16:48:50 -07:00
Awni Hannun
6ef2f67e7f
5bit quants ( #2226 )
...
* 5bit quants
* 5bit quants
2025-05-30 12:12:10 -07:00
Angelos Katharopoulos
0654543dcc
Add complex eigh ( #2191 )
2025-05-18 00:18:43 -07:00
Awni Hannun
c1eb9d05d9
non-symmetric eig and eigh ( #2188 )
2025-05-15 13:01:44 -07:00
Cheng
eca2f3eb97
Add remove_index utility ( #2173 )
2025-05-13 17:09:56 -07:00
Awni Hannun
8f3d208dce
Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )
...
* handle hadamard and addmm on empty inputs
* fix
2025-05-12 10:48:57 -07:00
ATurker
a7fae8a176
fix: conv_general differences between gpu, cpu ( #2070 )
...
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-05-09 10:26:52 -07:00
Awni Hannun
f1606486d2
Generalize gpu backend ( #2138 )
...
* generalize gpu backend
* fix no_gpu build
* fix no_gpu build
* generalize gpu backend
2025-04-30 09:08:17 -07:00
hdeng-apple
86984cad68
Remove static initializers ( #2059 )
...
* Remove static initializers in device.cpp, load.cpp, pocketfft.h
* Remove static initializer InTracing::trace_stack
* Remove static initializer of CompilerCache cache
* Revert changes in pocketfft.h
* Remove duplicate private section of thread_pool()
2025-04-24 06:14:49 -07:00
Yury Popov
1d2c9d6a07
Complex scan ( #2094 )
2025-04-22 18:56:28 -07:00
Yury Popov
e9e268336b
LogCumSumExp ( #2069 )
2025-04-13 01:27:29 -07:00
Angelos Katharopoulos
ddaa4b7dcb
Fix the test and add custom min/max reductions for uncommon MPI types ( #2060 )
2025-04-10 17:01:17 -07:00
Cheng
dfae2c6989
Fix MSVC build due to use of M_LN2 ( #2058 )
2025-04-10 07:41:41 -07:00
Anastasiia Filippova
515f104926
Min / max reductions ( #2041 )
2025-04-09 23:22:20 -07:00
Awni Hannun
f2c85308c1
add a half simd gemm fallback ( #2046 )
...
* add a half simd gemm fallback
* nit
2025-04-07 09:31:29 -07:00
Awni Hannun
c23888acd7
Fix build warning ( #2033 )
2025-04-01 14:42:27 -07:00
Awni Hannun
de5f38fd48
Custom logsumexp ( #2028 )
...
* initial custom logsumexp
* more tests
* comments + fix
2025-03-31 07:36:55 -07:00
Jesper Stemann Andersen
5f5770e3a2
Fix CPU sign for unsigned ints ( #2024 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-03-30 17:56:59 -07:00
Awni Hannun
28f39e9038
Log for complex numbers in Metal ( #2025 )
...
* Log for complex numbers in Metal
* fix log2
2025-03-30 17:04:38 -07:00
Angelos Katharopoulos
4eef8102c9
Distributed layers ( #1270 )
2025-03-21 13:52:17 -07:00
Awni Hannun
7b7e2352cd
fix malloc or wait deadlock ( #1976 )
2025-03-20 16:48:43 -07:00
Awni Hannun
3c164fca8c
Fix multistream GPU deadlock ( #1969 )
...
* fix multistream GPU deadlock
* comments
2025-03-20 07:19:47 -07:00
Awni Hannun
736a340478
reduce binary size ( #1952 )
2025-03-11 06:30:44 -07:00
Awni Hannun
c4230747a1
redesign for faster cpu/gpu synch ( #1869 )
...
* redesign for faster cpu/gpu synch
* load + more async CPU
* use command encoder API and move more ops to use it
* make fence back-end generic + CPU only fence
* faster build
* fix async eval
* fixes + handle temporaries
* fix / improve cpu conv
* remove unused status, fix siblings
* fix extensions
* fix
* fix no cpu build
* format
* comments
* fix perf regression, remove unecessary abort
* fix events, task limit cpu
* fix waiting
* fix donation / temporaries in normalization
2025-03-06 19:23:38 -08:00
Alex Barron
fd0d63ba5b
Affine quant always in fp32 ( #1925 )
...
* do affine quant in fp32
* static cast
2025-03-04 17:50:19 -08:00
Abe Leininger
3835a428c5
Adds nuclear norm support ( #1894 )
...
* adjust norm unit test tolerance
2025-03-04 13:26:02 -08:00
Awni Hannun
7d042f17fe
Double for lapack ( #1904 )
...
* double for lapack ops
* add double support for lapack ops
2025-02-25 11:39:36 -08:00
Awni Hannun
7face5d9fd
fix cpu compile ( #1897 )
2025-02-24 14:10:30 -08:00
Awni Hannun
2d0f384b6f
fix simd erf_inv ( #1896 )
2025-02-24 13:57:47 -08:00
Jesper Stemann Andersen
0ebc8a3d25
Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp ( #1890 )
2025-02-20 12:02:12 -08:00
Awni Hannun
bbda0fdbdb
Allow non-square lu ( #1889 )
2025-02-20 08:13:23 -08:00
Abe Leininger
344a29506e
Enforce triangular matrix form in tri_inv ( #1876 )
...
* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201 .
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-02-19 12:42:33 -08:00
Awni Hannun
5274c3c43f
compiler warnings are errors ( #1870 )
2025-02-17 00:07:49 -08:00
Alex Barron
5cd97f7ffe
Bitwise Inverse ( #1862 )
...
* add bitwise inverse
* add vmap + fix nojit
* inverse -> invert
* add to compile + remove unused
2025-02-13 08:44:14 -08:00
Cheng
142b77751d
Fix compilation error on Windows ( #1844 )
2025-02-10 19:53:05 -08:00
Abe Leininger
a5ededf1c3
CPU LU factorization and linear solvers ( #1451 )
...
* linalg solve backend
* nits
* more nits + fix
* luf primitive and lu, solve, and solve_triangular backends
* changes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-02-10 12:32:24 -08:00
Awni Hannun
1c0c118f7c
Fp64 on the CPU ( #1843 )
...
* add fp64 data type
* clean build
* update docs
* fix bug
2025-02-07 15:52:22 -08:00
Awni Hannun
af1b725fda
Fix a couple of slicing bugs ( #1827 )
...
* fix a few bugs
* fix conv grad
* speedup test
* comment
2025-02-05 19:50:08 -08:00
Awni Hannun
1156c84e86
Refactor common into cpu specific and truly common ( #1817 )
...
* refactor
* fix extension example
* fix no-cpu
2025-02-03 15:58:02 -08:00