Awni Hannun
3f0b33ecd5
use proper version
2025-07-08 21:30:18 +00:00
Awni Hannun
a88103a649
docs update
2025-07-08 21:30:18 +00:00
Awni Hannun
d0c2c3d1ca
docs update
2025-07-08 21:30:18 +00:00
Awni Hannun
4c160546aa
docs update
2025-07-08 21:30:18 +00:00
Awni Hannun
829d18cae7
docs update
2025-07-08 21:30:18 +00:00
Awni Hannun
798eab9f5e
remove uneeded files in docs
2025-07-08 21:30:18 +00:00
Awni Hannun
3c372a8f6f
update docs
2025-07-08 21:30:18 +00:00
Awni Hannun
2010d815aa
docs update
2025-07-08 21:30:17 +00:00
Awni Hannun
51399f9a37
docs up
2025-07-08 21:30:17 +00:00
Awni Hannun
de0fc12d34
docs up
2025-07-08 21:30:17 +00:00
Awni Hannun
37fdd687b0
docs update
2025-07-08 21:30:17 +00:00
Awni Hannun
6d35eb1742
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
ffe51a69ca
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
087651b1b1
update docs
2025-07-08 21:30:17 +00:00
Awni Hannun
efda283748
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
1e6c2e6cfe
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
cb7a2041a5
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
41bb77f7e3
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
14d44f5869
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
dc37663f6b
docs
2025-07-08 21:30:17 +00:00
Awni Hannun
527f486e59
docs
2025-07-08 21:30:16 +00:00
Awni Hannun
96b48b3154
docs
2025-07-08 21:30:16 +00:00
Awni Hannun
075ca7feed
docs
2025-07-08 21:30:16 +00:00
Awni Hannun
fb4e8b896b
patch bump ( #2343 )
2025-07-08 14:26:07 -07:00
Cheng
2ca533b279
Fix compilation with CUDA 11 ( #2331 )
2025-07-07 20:00:43 -07:00
Angelos Katharopoulos
4a9b29a875
MoE backward improvements ( #2335 )
2025-07-07 17:59:53 -07:00
Awni Hannun
a4fcc893cd
auto build linux release ( #2341 )
2025-07-07 09:29:23 -07:00
Cheng
9d10239af7
[CUDA] Do vectorized store/load in binary ops ( #2330 )
2025-07-07 08:44:14 -07:00
Cheng
19facd4b20
Build with all cpu cores by default ( #2336 )
2025-07-07 06:06:45 -07:00
Angelos Katharopoulos
f5299f72cd
Fix layernorm race condition ( #2340 )
2025-07-07 06:06:01 -07:00
Cheng
0e0d9ac522
[CUDA] Add MLX_CUDA_GRAPH_CACHE_SIZE env for setting graph cache size ( #2329 )
2025-07-05 08:33:29 -07:00
Awni Hannun
8917022deb
fix graphs for older cuda ( #2328 )
2025-07-02 19:37:58 -07:00
Awni Hannun
ec0d5db67b
[CUDA] Switch to CUDA graphs ( #2317 )
...
* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment
2025-07-02 15:59:13 -07:00
Cheng
e76e9b87f0
Fix compilation error from integral_constant ( #2326 )
2025-07-02 06:04:38 -07:00
Awni Hannun
cfb6a244ea
allow parameters to be deleted ( #2325 )
2025-07-01 21:27:23 -07:00
Awni Hannun
58f3860306
patch bump ( #2324 )
2025-07-01 12:12:16 -07:00
Awni Hannun
dd4f53db63
use fp32 for testing, add more complex ops ( #2322 )
2025-07-01 07:30:00 -07:00
Angelos Katharopoulos
3d5e17e507
MLX_SWITCH macros to templates ( #2320 )
2025-07-01 01:33:44 -07:00
Awni Hannun
33bf1a244b
Fix module update in strict mode ( #2321 )
...
* fix module update in strict mode
* allow GELU to be pickled
2025-06-29 11:12:29 -07:00
Angelos Katharopoulos
772f471ff2
[CUDA] Fix reductions ( #2314 )
2025-06-27 12:59:20 -07:00
Angelos Katharopoulos
2c11d10f8d
Split broadcast so it is always fused in compile ( #2318 )
2025-06-26 22:08:18 -07:00
Angelos Katharopoulos
656ed7f780
Fix get 2d grid dims ( #2316 )
2025-06-25 13:03:09 -07:00
Awni Hannun
81bb9a2a9e
Compile float64 functions on CPU ( #2311 )
2025-06-24 10:18:52 -07:00
Angelos Katharopoulos
5adf185f86
Fix update_modules()
when providing a subset ( #2308 )
2025-06-20 17:19:46 -07:00
Awni Hannun
c9a9180584
Cuda perf tuning ( #2307 )
...
* perf tuning
* fix adding inputs arrays in matmul / srot
* format
* fix
2025-06-20 14:50:57 -07:00
Awni Hannun
76831ed83d
Build CUDA release in Circle ( #2306 )
...
* cuda release
* add license
2025-06-19 15:26:36 -07:00
Angelos Katharopoulos
b3d7b85376
Make ptx cache settable by environment variable ( #2304 )
2025-06-17 23:55:56 -07:00
Awni Hannun
cad5c0241c
[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )
...
* cuda synch properly waits for all tasks to finish and clear
* fix copy
2025-06-17 12:03:25 -07:00
Awni Hannun
b8022c578a
divmod, partition, sort fixes ( #2302 )
2025-06-16 18:49:32 -07:00
Awni Hannun
bc53f8293f
Cuda bug fixes 2 ( #2298 )
...
* more bug fixes
* more bug fixes
* format
2025-06-16 13:14:46 -07:00