Awni Hannun
|
c098245664
|
docs update
|
2025-06-04 01:01:49 +00:00 |
|
Awni Hannun
|
981bf7ae2b
|
docs update
|
2025-06-04 01:01:49 +00:00 |
|
Awni Hannun
|
ce95a29690
|
docs update
|
2025-06-04 01:01:49 +00:00 |
|
Awni Hannun
|
9e69a72b8c
|
docs update
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
17470bf630
|
remove uneeded files in docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
a693e6e1d8
|
update docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
fd34610634
|
docs update
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
84bebc2161
|
docs up
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
9882295582
|
docs up
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
ebd913400a
|
docs update
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
217cdf3fc9
|
docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
43cd655ba1
|
docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
8c406bcb9b
|
update docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
01489e172d
|
docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
616449e363
|
docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
a66e6d3214
|
docs
|
2025-06-04 01:01:48 +00:00 |
|
Awni Hannun
|
d3d0ad9564
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Awni Hannun
|
a60a600c6a
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Awni Hannun
|
e84ebcf0b9
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Awni Hannun
|
372f2ac025
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Awni Hannun
|
80322b562e
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Awni Hannun
|
fbd10a48d4
|
docs
|
2025-06-04 01:01:47 +00:00 |
|
Angelos Katharopoulos
|
aede70e81d
|
Perf regression fix (#2243)
|
2025-06-03 17:55:12 -07:00 |
|
Cheng
|
85a8beb5e4
|
Avoid atomic updates across CPU/GPU in CUDA event (#2231)
|
2025-06-03 16:49:06 -07:00 |
|
Cheng
|
0bb89e9e5f
|
Share more common code in Compiled (#2240)
* Share more common code in Compiled
* Remove build_lib_name
|
2025-06-03 16:48:50 -07:00 |
|
Cheng
|
5685ceb3c7
|
Avoid invoking allocator::malloc when creating CUDA event (#2232)
|
2025-06-03 16:48:40 -07:00 |
|
Suryash Malviya
|
0408ba0a76
|
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm (#2220)
* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
|
2025-06-02 15:58:46 -07:00 |
|
Awni Hannun
|
cbad6c3093
|
version (#2237)
|
2025-06-02 15:58:33 -07:00 |
|
Cheng
|
1b021f6984
|
Fast primitives decide when to use the fallback (#2216)
|
2025-06-02 13:26:37 -07:00 |
|
Cheng
|
95b7551d65
|
Do not check event.is_signaled() in eval_impl (#2230)
|
2025-06-02 13:23:34 -07:00 |
|
Cheng
|
db5a7c6192
|
Add memory cache to CUDA backend (#2221)
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
|
2025-05-30 12:12:54 -07:00 |
|
Awni Hannun
|
6ef2f67e7f
|
5bit quants (#2226)
* 5bit quants
* 5bit quants
|
2025-05-30 12:12:10 -07:00 |
|
Cheng
|
f76ee1ffd2
|
Move some dims utils to common (#2223)
|
2025-05-29 06:48:30 -07:00 |
|
Cheng
|
54a71f270a
|
Remove unused defines (#2217)
|
2025-05-23 06:14:58 -07:00 |
|
Awni Hannun
|
55b4062dd8
|
copyright in docs (#2214)
|
2025-05-21 17:13:04 -07:00 |
|
Cheng
|
79071bfba4
|
Fix out-of-bounds default value in logsumexp/softmax (#2213)
|
2025-05-21 07:25:16 -07:00 |
|
Cheng
|
7774b87cbd
|
Remove redundant simd_sum in logsumexp (#2210)
|
2025-05-21 07:25:03 -07:00 |
|
Cheng
|
35c87741cf
|
Build for compute capability 70 instead of 75 (#2209)
|
2025-05-20 19:42:48 -07:00 |
|
Jack Wind
|
4cbe605214
|
Feat: Allow per-target Metal debug flags (#2201)
* feat: allow per-target Metal debug flags
* formatting fix
|
2025-05-20 10:22:26 -07:00 |
|
Clement Liaw
|
ab8883dd55
|
include mlx::core::version() symbols in the mlx static library (#2207)
|
2025-05-20 07:39:11 -07:00 |
|
Awni Hannun
|
eebe73001a
|
fix large arg reduce (#2206)
|
2025-05-19 13:10:44 -07:00 |
|
Angelos Katharopoulos
|
0359bf02c9
|
Nearest upsample (#2202)
|
2025-05-19 11:23:38 -07:00 |
|
Cheng
|
237f9e58a8
|
Fix BEFORE keyword in target_include_directories (#2204)
|
2025-05-19 06:10:44 -07:00 |
|
Awni Hannun
|
8576e6fe36
|
fix conv2d bug + faster conv 1d (#2195)
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
|
2025-05-18 06:05:11 -07:00 |
|
Angelos Katharopoulos
|
0654543dcc
|
Add complex eigh (#2191)
|
2025-05-18 00:18:43 -07:00 |
|
Awni Hannun
|
48ef3e74e2
|
reduce vjp for all and any (#2193)
|
2025-05-16 08:38:49 -07:00 |
|
Cheng
|
7d4b378952
|
Include cuda_bf16.h for bfloat16 overloads (#2192)
* Include cuda_bf16.h for bfloat16 overloads
* Add NO_GPU_MULTI(Eig) in cuda backend
|
2025-05-16 06:44:42 -07:00 |
|
Jack Wind
|
7ff5c41e06
|
Add set_threadgroup_memory_length to CommandEncoder (#2183)
|
2025-05-16 00:28:03 -07:00 |
|
Awni Hannun
|
602f43e3d1
|
fix conv grad (#2187)
|
2025-05-15 19:20:36 -07:00 |
|
Awni Hannun
|
a2cadb8218
|
real and imag properties (#2189)
|
2025-05-15 18:17:50 -07:00 |
|