Awni Hannun
|
c763fe1be0
|
default strict mode for module update and update_modules (#2239)
|
2025-06-05 15:27:02 -07:00 |
|
Cheng
|
52dc8c8cd5
|
Add profiler annotations in common primitives for CUDA backend (#2244)
|
2025-06-04 19:55:12 -07:00 |
|
Angelos Katharopoulos
|
aede70e81d
|
Perf regression fix (#2243)
|
2025-06-03 17:55:12 -07:00 |
|
Cheng
|
85a8beb5e4
|
Avoid atomic updates across CPU/GPU in CUDA event (#2231)
|
2025-06-03 16:49:06 -07:00 |
|
Cheng
|
0bb89e9e5f
|
Share more common code in Compiled (#2240)
* Share more common code in Compiled
* Remove build_lib_name
|
2025-06-03 16:48:50 -07:00 |
|
Cheng
|
5685ceb3c7
|
Avoid invoking allocator::malloc when creating CUDA event (#2232)
|
2025-06-03 16:48:40 -07:00 |
|
Suryash Malviya
|
0408ba0a76
|
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220)
* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
|
2025-06-02 15:58:46 -07:00 |
|
Awni Hannun
|
cbad6c3093
|
version (#2237)
|
2025-06-02 15:58:33 -07:00 |
|
Cheng
|
1b021f6984
|
Fast primitives decide when to use the fallback (#2216)
|
2025-06-02 13:26:37 -07:00 |
|
Cheng
|
95b7551d65
|
Do not check event.is_signaled() in eval_impl (#2230)
|
2025-06-02 13:23:34 -07:00 |
|
Cheng
|
db5a7c6192
|
Add memory cache to CUDA backend (#2221)
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
|
2025-05-30 12:12:54 -07:00 |
|
Awni Hannun
|
6ef2f67e7f
|
5bit quants (#2226)
* 5bit quants
* 5bit quants
|
2025-05-30 12:12:10 -07:00 |
|
Cheng
|
f76ee1ffd2
|
Move some dims utils to common (#2223)
|
2025-05-29 06:48:30 -07:00 |
|
Cheng
|
54a71f270a
|
Remove unused defines (#2217)
|
2025-05-23 06:14:58 -07:00 |
|
Awni Hannun
|
55b4062dd8
|
copyright in docs (#2214)
|
2025-05-21 17:13:04 -07:00 |
|
Cheng
|
79071bfba4
|
Fix out-of-bounds default value in logsumexp/softmax (#2213)
|
2025-05-21 07:25:16 -07:00 |
|
Cheng
|
7774b87cbd
|
Remove redundant simd_sum in logsumexp (#2210)
|
2025-05-21 07:25:03 -07:00 |
|
Cheng
|
35c87741cf
|
Build for compute capability 70 instead of 75 (#2209)
|
2025-05-20 19:42:48 -07:00 |
|
Jack Wind
|
4cbe605214
|
Feat: Allow per-target Metal debug flags (#2201)
* feat: allow per-target Metal debug flags
* formatting fix
|
2025-05-20 10:22:26 -07:00 |
|
Clement Liaw
|
ab8883dd55
|
include mlx::core::version() symbols in the mlx static library (#2207)
|
2025-05-20 07:39:11 -07:00 |
|
Awni Hannun
|
eebe73001a
|
fix large arg reduce (#2206)
|
2025-05-19 13:10:44 -07:00 |
|
Angelos Katharopoulos
|
0359bf02c9
|
Nearest upsample (#2202)
|
2025-05-19 11:23:38 -07:00 |
|
Cheng
|
237f9e58a8
|
Fix BEFORE keyword in target_include_directories (#2204)
|
2025-05-19 06:10:44 -07:00 |
|
Awni Hannun
|
8576e6fe36
|
fix conv2d bug + faster conv 1d (#2195)
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
|
2025-05-18 06:05:11 -07:00 |
|
Angelos Katharopoulos
|
0654543dcc
|
Add complex eigh (#2191)
|
2025-05-18 00:18:43 -07:00 |
|
Awni Hannun
|
48ef3e74e2
|
reduce vjp for all and any (#2193)
|
2025-05-16 08:38:49 -07:00 |
|
Cheng
|
7d4b378952
|
Include cuda_bf16.h for bfloat16 overloads (#2192)
* Include cuda_bf16.h for bfloat16 overloads
* Add NO_GPU_MULTI(Eig) in cuda backend
|
2025-05-16 06:44:42 -07:00 |
|
Jack Wind
|
7ff5c41e06
|
Add set_threadgroup_memory_length to CommandEncoder (#2183)
|
2025-05-16 00:28:03 -07:00 |
|
Awni Hannun
|
602f43e3d1
|
fix conv grad (#2187)
|
2025-05-15 19:20:36 -07:00 |
|
Awni Hannun
|
a2cadb8218
|
real and imag properties (#2189)
|
2025-05-15 18:17:50 -07:00 |
|
Awni Hannun
|
c1eb9d05d9
|
non-symmetric eig and eigh (#2188)
|
2025-05-15 13:01:44 -07:00 |
|
Angelos Katharopoulos
|
cf6c939e86
|
Fix some complex vjps (#2178)
|
2025-05-14 23:37:12 -07:00 |
|
Angelos Katharopoulos
|
130df35e1b
|
Add random normal distribution for complex numbers (#2182)
|
2025-05-13 22:43:45 -07:00 |
|
Cheng
|
0751263dec
|
Fix typo in row_reduce_small (#2179)
|
2025-05-13 20:19:54 -07:00 |
|
Cheng
|
eca2f3eb97
|
Add remove_index utility (#2173)
|
2025-05-13 17:09:56 -07:00 |
|
Angelos Katharopoulos
|
3aa9cf3f9e
|
Fix put_along_axis for empty arrays (#2181)
|
2025-05-13 14:27:53 -07:00 |
|
Awni Hannun
|
8f3d208dce
|
Close a couple edge case bugs: hadamard and addmm on empty inputs (#2177)
* handle hadamard and addmm on empty inputs
* fix
|
2025-05-12 10:48:57 -07:00 |
|
Ivan Fioravanti
|
caaa3f1f8c
|
Small typos in mx.metal deprecations (#2176)
|
2025-05-11 06:03:47 -07:00 |
|
Awni Hannun
|
659a51919f
|
patch bump (#2162)
|
2025-05-09 14:35:14 -07:00 |
|
Awni Hannun
|
6661387066
|
Fix fft for integer overflow (#2161)
|
2025-05-09 14:25:12 -07:00 |
|
ATurker
|
a7fae8a176
|
fix: conv_general differences between gpu, cpu (#2070)
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
|
2025-05-09 10:26:52 -07:00 |
|
Cheng
|
0cae0bdac8
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |
|
Awni Hannun
|
5a1a5d5ed1
|
fix input coherent kernel launch (#2153)
|
2025-05-05 17:30:50 -07:00 |
|
Cheng
|
1683975acf
|
Move common gpu primitives to backend/gpu (#2145)
|
2025-05-05 13:45:29 -07:00 |
|
Awni Hannun
|
af705590ac
|
fix batched vector sdpa (#2152)
|
2025-05-05 13:13:03 -07:00 |
|
Awni Hannun
|
825124af8f
|
fix bw for elementwise ops (#2151)
* fix bw for elementwise ops
* add compile
* fix
* fix
* fix
* fix
|
2025-05-05 06:15:04 -07:00 |
|
Awni Hannun
|
9c5e7da507
|
fix compile merging (#2150)
|
2025-05-02 15:08:50 -07:00 |
|
Angelos Katharopoulos
|
481349495b
|
GPU Hadamard for large N (#1879)
|
2025-05-01 17:19:17 -07:00 |
|
Awni Hannun
|
9daa6b003f
|
fix shapeless export (#2148)
|
2025-05-01 15:02:02 -07:00 |
|
Angelos Katharopoulos
|
a3a632d567
|
Fix the launcher when ran locally (#2147)
|
2025-05-01 12:56:09 -07:00 |
|