Awni Hannun
cbad6c3093
version ( #2237 )
2025-06-02 15:58:33 -07:00
Cheng
1b021f6984
Fast primitives decide when to use the fallback ( #2216 )
2025-06-02 13:26:37 -07:00
Cheng
95b7551d65
Do not check event.is_signaled() in eval_impl ( #2230 )
2025-06-02 13:23:34 -07:00
Cheng
db5a7c6192
Add memory cache to CUDA backend ( #2221 )
...
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
2025-05-30 12:12:54 -07:00
Awni Hannun
6ef2f67e7f
5bit quants ( #2226 )
...
* 5bit quants
* 5bit quants
2025-05-30 12:12:10 -07:00
Cheng
f76ee1ffd2
Move some dims utils to common ( #2223 )
2025-05-29 06:48:30 -07:00
Cheng
54a71f270a
Remove unused defines ( #2217 )
2025-05-23 06:14:58 -07:00
Awni Hannun
55b4062dd8
copyright in docs ( #2214 )
2025-05-21 17:13:04 -07:00
Cheng
79071bfba4
Fix out-of-bounds default value in logsumexp/softmax ( #2213 )
2025-05-21 07:25:16 -07:00
Cheng
7774b87cbd
Remove redundant simd_sum in logsumexp ( #2210 )
2025-05-21 07:25:03 -07:00
Cheng
35c87741cf
Build for compute capability 70 instead of 75 ( #2209 )
2025-05-20 19:42:48 -07:00
Jack Wind
4cbe605214
Feat: Allow per-target Metal debug flags ( #2201 )
...
* feat: allow per-target Metal debug flags
* formatting fix
2025-05-20 10:22:26 -07:00
Clement Liaw
ab8883dd55
include mlx::core::version() symbols in the mlx static library ( #2207 )
2025-05-20 07:39:11 -07:00
Awni Hannun
eebe73001a
fix large arg reduce ( #2206 )
2025-05-19 13:10:44 -07:00
Angelos Katharopoulos
0359bf02c9
Nearest upsample ( #2202 )
2025-05-19 11:23:38 -07:00
Cheng
237f9e58a8
Fix BEFORE keyword in target_include_directories ( #2204 )
2025-05-19 06:10:44 -07:00
Awni Hannun
8576e6fe36
fix conv2d bug + faster conv 1d ( #2195 )
...
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
2025-05-18 06:05:11 -07:00
Angelos Katharopoulos
0654543dcc
Add complex eigh ( #2191 )
2025-05-18 00:18:43 -07:00
Awni Hannun
48ef3e74e2
reduce vjp for all and any ( #2193 )
2025-05-16 08:38:49 -07:00
Cheng
7d4b378952
Include cuda_bf16.h for bfloat16 overloads ( #2192 )
...
* Include cuda_bf16.h for bfloat16 overloads
* Add NO_GPU_MULTI(Eig) in cuda backend
2025-05-16 06:44:42 -07:00
Jack Wind
7ff5c41e06
Add set_threadgroup_memory_length to CommandEncoder ( #2183 )
2025-05-16 00:28:03 -07:00
Awni Hannun
602f43e3d1
fix conv grad ( #2187 )
2025-05-15 19:20:36 -07:00
Awni Hannun
a2cadb8218
real and imag properties ( #2189 )
2025-05-15 18:17:50 -07:00
Awni Hannun
c1eb9d05d9
non-symmetric eig and eigh ( #2188 )
2025-05-15 13:01:44 -07:00
Angelos Katharopoulos
cf6c939e86
Fix some complex vjps ( #2178 )
2025-05-14 23:37:12 -07:00
Angelos Katharopoulos
130df35e1b
Add random normal distribution for complex numbers ( #2182 )
2025-05-13 22:43:45 -07:00
Cheng
0751263dec
Fix typo in row_reduce_small ( #2179 )
2025-05-13 20:19:54 -07:00
Cheng
eca2f3eb97
Add remove_index utility ( #2173 )
2025-05-13 17:09:56 -07:00
Angelos Katharopoulos
3aa9cf3f9e
Fix put_along_axis for empty arrays ( #2181 )
2025-05-13 14:27:53 -07:00
Awni Hannun
8f3d208dce
Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )
...
* handle hadamard and addmm on empty inputs
* fix
2025-05-12 10:48:57 -07:00
Ivan Fioravanti
caaa3f1f8c
Small typos in mx.metal deprecations ( #2176 )
2025-05-11 06:03:47 -07:00
Awni Hannun
659a51919f
patch bump ( #2162 )
2025-05-09 14:35:14 -07:00
Awni Hannun
6661387066
Fix fft for integer overflow ( #2161 )
2025-05-09 14:25:12 -07:00
ATurker
a7fae8a176
fix: conv_general differences between gpu, cpu ( #2070 )
...
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-05-09 10:26:52 -07:00
Cheng
0cae0bdac8
CUDA backend: backbone ( #2075 )
2025-05-06 21:26:46 -07:00
Awni Hannun
5a1a5d5ed1
fix input coherent kernel launch ( #2153 )
2025-05-05 17:30:50 -07:00
Cheng
1683975acf
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
Awni Hannun
af705590ac
fix batched vector sdpa ( #2152 )
2025-05-05 13:13:03 -07:00
Awni Hannun
825124af8f
fix bw for elementwise ops ( #2151 )
...
* fix bw for elementwise ops
* add compile
* fix
* fix
* fix
* fix
2025-05-05 06:15:04 -07:00
Awni Hannun
9c5e7da507
fix compile merging ( #2150 )
2025-05-02 15:08:50 -07:00
Angelos Katharopoulos
481349495b
GPU Hadamard for large N ( #1879 )
2025-05-01 17:19:17 -07:00
Awni Hannun
9daa6b003f
fix shapeless export ( #2148 )
2025-05-01 15:02:02 -07:00
Angelos Katharopoulos
a3a632d567
Fix the launcher when ran locally ( #2147 )
2025-05-01 12:56:09 -07:00
Awni Hannun
e496c5a4b4
fix integer overflow in qmm ( #2143 )
2025-04-30 09:28:56 -07:00
Cheng
ea890d8710
Remove metal-only tests ( #2139 )
2025-04-30 09:08:39 -07:00
Awni Hannun
aa5d84f102
Allow quant layer to be unfrozen ( #2142 )
2025-04-30 09:08:29 -07:00
Awni Hannun
f1606486d2
Generalize gpu backend ( #2138 )
...
* generalize gpu backend
* fix no_gpu build
* fix no_gpu build
* generalize gpu backend
2025-04-30 09:08:17 -07:00
Cheng
87720a8908
Fix building with uv ( #2141 )
2025-04-30 06:04:07 -07:00
Aashiq Dheeraj
bb6565ef14
add fftshift and ifftshift fft helpers ( #2135 )
...
* add fftshift and ifftshift fft helpers
* address comments
* axes have to be iterable
* fix fp error in roll + add test
---------
Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local>
2025-04-29 22:13:45 -07:00
Awni Hannun
7bb063bcb3
Enable vjp for quantized scale and bias ( #2129 )
...
* Enable vjp for quantized scale and bias
* higher tol
2025-04-29 13:03:09 -07:00