Commit Graph

1193 Commits

Author SHA1 Message Date
Awni Hannun
c098245664 docs update 2025-06-04 01:01:49 +00:00
Awni Hannun
981bf7ae2b docs update 2025-06-04 01:01:49 +00:00
Awni Hannun
ce95a29690 docs update 2025-06-04 01:01:49 +00:00
Awni Hannun
9e69a72b8c docs update 2025-06-04 01:01:48 +00:00
Awni Hannun
17470bf630 remove uneeded files in docs 2025-06-04 01:01:48 +00:00
Awni Hannun
a693e6e1d8 update docs 2025-06-04 01:01:48 +00:00
Awni Hannun
fd34610634 docs update 2025-06-04 01:01:48 +00:00
Awni Hannun
84bebc2161 docs up 2025-06-04 01:01:48 +00:00
Awni Hannun
9882295582 docs up 2025-06-04 01:01:48 +00:00
Awni Hannun
ebd913400a docs update 2025-06-04 01:01:48 +00:00
Awni Hannun
217cdf3fc9 docs 2025-06-04 01:01:48 +00:00
Awni Hannun
43cd655ba1 docs 2025-06-04 01:01:48 +00:00
Awni Hannun
8c406bcb9b update docs 2025-06-04 01:01:48 +00:00
Awni Hannun
01489e172d docs 2025-06-04 01:01:48 +00:00
Awni Hannun
616449e363 docs 2025-06-04 01:01:48 +00:00
Awni Hannun
a66e6d3214 docs 2025-06-04 01:01:48 +00:00
Awni Hannun
d3d0ad9564 docs 2025-06-04 01:01:47 +00:00
Awni Hannun
a60a600c6a docs 2025-06-04 01:01:47 +00:00
Awni Hannun
e84ebcf0b9 docs 2025-06-04 01:01:47 +00:00
Awni Hannun
372f2ac025 docs 2025-06-04 01:01:47 +00:00
Awni Hannun
80322b562e docs 2025-06-04 01:01:47 +00:00
Awni Hannun
fbd10a48d4 docs 2025-06-04 01:01:47 +00:00
Angelos Katharopoulos
aede70e81d
Perf regression fix (#2243) 2025-06-03 17:55:12 -07:00
Cheng
85a8beb5e4
Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
Cheng
0bb89e9e5f
Share more common code in Compiled (#2240)
* Share more common code in Compiled

* Remove build_lib_name
2025-06-03 16:48:50 -07:00
Cheng
5685ceb3c7
Avoid invoking allocator::malloc when creating CUDA event (#2232) 2025-06-03 16:48:40 -07:00
Suryash Malviya
0408ba0a76
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220)
* Implementing Complex Matmul using Karatsuba Algorithm

* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them

* fix

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-02 15:58:46 -07:00
Awni Hannun
cbad6c3093
version (#2237) 2025-06-02 15:58:33 -07:00
Cheng
1b021f6984
Fast primitives decide when to use the fallback (#2216) 2025-06-02 13:26:37 -07:00
Cheng
95b7551d65
Do not check event.is_signaled() in eval_impl (#2230) 2025-06-02 13:23:34 -07:00
Cheng
db5a7c6192
Add memory cache to CUDA backend (#2221)
* Move BufferCache out of allocator

* Add memory cache to cuda backend allocator

* Simplify BufferCache assuming buf can not be null
2025-05-30 12:12:54 -07:00
Awni Hannun
6ef2f67e7f
5bit quants (#2226)
* 5bit quants

* 5bit quants
2025-05-30 12:12:10 -07:00
Cheng
f76ee1ffd2
Move some dims utils to common (#2223) 2025-05-29 06:48:30 -07:00
Cheng
54a71f270a
Remove unused defines (#2217) 2025-05-23 06:14:58 -07:00
Awni Hannun
55b4062dd8
copyright in docs (#2214) 2025-05-21 17:13:04 -07:00
Cheng
79071bfba4
Fix out-of-bounds default value in logsumexp/softmax (#2213) 2025-05-21 07:25:16 -07:00
Cheng
7774b87cbd
Remove redundant simd_sum in logsumexp (#2210) 2025-05-21 07:25:03 -07:00
Cheng
35c87741cf
Build for compute capability 70 instead of 75 (#2209) 2025-05-20 19:42:48 -07:00
Jack Wind
4cbe605214
Feat: Allow per-target Metal debug flags (#2201)
* feat: allow per-target Metal debug flags

* formatting fix
2025-05-20 10:22:26 -07:00
Clement Liaw
ab8883dd55
include mlx::core::version() symbols in the mlx static library (#2207) 2025-05-20 07:39:11 -07:00
Awni Hannun
eebe73001a
fix large arg reduce (#2206) 2025-05-19 13:10:44 -07:00
Angelos Katharopoulos
0359bf02c9
Nearest upsample (#2202) 2025-05-19 11:23:38 -07:00
Cheng
237f9e58a8
Fix BEFORE keyword in target_include_directories (#2204) 2025-05-19 06:10:44 -07:00
Awni Hannun
8576e6fe36
fix conv2d bug + faster conv 1d (#2195)
* fix conv2d bug + faster conv 1d

* revert sort + flaky test
2025-05-18 06:05:11 -07:00
Angelos Katharopoulos
0654543dcc
Add complex eigh (#2191) 2025-05-18 00:18:43 -07:00
Awni Hannun
48ef3e74e2
reduce vjp for all and any (#2193) 2025-05-16 08:38:49 -07:00
Cheng
7d4b378952
Include cuda_bf16.h for bfloat16 overloads (#2192)
* Include cuda_bf16.h for bfloat16 overloads

* Add NO_GPU_MULTI(Eig) in cuda backend
2025-05-16 06:44:42 -07:00
Jack Wind
7ff5c41e06
Add set_threadgroup_memory_length to CommandEncoder (#2183) 2025-05-16 00:28:03 -07:00
Awni Hannun
602f43e3d1
fix conv grad (#2187) 2025-05-15 19:20:36 -07:00
Awni Hannun
a2cadb8218
real and imag properties (#2189) 2025-05-15 18:17:50 -07:00