Cheng 
							
						 
					 
					
						
						
							
						
						9d10239af7 
					 
					
						
						
							
							[CUDA] Do vectorized store/load in binary ops ( #2330 )  
						
						
						
						
							
						
					 
					
						2025-07-07 08:44:14 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						19facd4b20 
					 
					
						
						
							
							Build with all cpu cores by default ( #2336 )  
						
						
						
						
							
						
					 
					
						2025-07-07 06:06:45 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						f5299f72cd 
					 
					
						
						
							
							Fix layernorm race condition ( #2340 )  
						
						
						
						
							
						
					 
					
						2025-07-07 06:06:01 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						0e0d9ac522 
					 
					
						
						
							
							[CUDA] Add MLX_CUDA_GRAPH_CACHE_SIZE env for setting graph cache size ( #2329 )  
						
						
						
						
							
						
					 
					
						2025-07-05 08:33:29 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8917022deb 
					 
					
						
						
							
							fix graphs for older cuda ( #2328 )  
						
						
						
						
							
						
					 
					
						2025-07-02 19:37:58 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec0d5db67b 
					 
					
						
						
							
							[CUDA] Switch to CUDA graphs ( #2317 )  
						
						... 
						
						
						
						* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment 
						
						
							
						
					 
					
						2025-07-02 15:59:13 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						e76e9b87f0 
					 
					
						
						
							
							Fix compilation error from integral_constant ( #2326 )  
						
						
						
						
							
						
					 
					
						2025-07-02 06:04:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cfb6a244ea 
					 
					
						
						
							
							allow parameters to be deleted ( #2325 )  
						
						
						
						
							
						
					 
					
						2025-07-01 21:27:23 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						58f3860306 
					 
					
						
						
							
							patch bump ( #2324 )  
						
						
						
						
							
 
						
					 
					
						2025-07-01 12:12:16 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dd4f53db63 
					 
					
						
						
							
							use fp32 for testing, add more complex ops ( #2322 )  
						
						
						
						
							
						
					 
					
						2025-07-01 07:30:00 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						3d5e17e507 
					 
					
						
						
							
							MLX_SWITCH macros to templates ( #2320 )  
						
						
						
						
							
						
					 
					
						2025-07-01 01:33:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						33bf1a244b 
					 
					
						
						
							
							Fix module update in strict mode ( #2321 )  
						
						... 
						
						
						
						* fix module update in strict mode
* allow GELU to be pickled 
						
						
							
						
					 
					
						2025-06-29 11:12:29 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						772f471ff2 
					 
					
						
						
							
							[CUDA] Fix reductions ( #2314 )  
						
						
						
						
							
						
					 
					
						2025-06-27 12:59:20 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						2c11d10f8d 
					 
					
						
						
							
							Split broadcast so it is always fused in compile ( #2318 )  
						
						
						
						
							
						
					 
					
						2025-06-26 22:08:18 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						656ed7f780 
					 
					
						
						
							
							Fix get 2d grid dims ( #2316 )  
						
						
						
						
							
						
					 
					
						2025-06-25 13:03:09 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						81bb9a2a9e 
					 
					
						
						
							
							Compile float64 functions on CPU ( #2311 )  
						
						
						
						
							
						
					 
					
						2025-06-24 10:18:52 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5adf185f86 
					 
					
						
						
							
							Fix update_modules() when providing a subset ( #2308 )  
						
						
						
						
							
						
					 
					
						2025-06-20 17:19:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c9a9180584 
					 
					
						
						
							
							Cuda perf tuning ( #2307 )  
						
						... 
						
						
						
						* perf tuning
* fix adding inputs arrays in matmul / srot
* format
* fix 
						
						
							
						
					 
					
						2025-06-20 14:50:57 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						76831ed83d 
					 
					
						
						
							
							Build CUDA release in Circle ( #2306 )  
						
						... 
						
						
						
						* cuda release
* add license 
						
						
							
						
					 
					
						2025-06-19 15:26:36 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						b3d7b85376 
					 
					
						
						
							
							Make ptx cache settable by environment variable ( #2304 )  
						
						
						
						
							
						
					 
					
						2025-06-17 23:55:56 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cad5c0241c 
					 
					
						
						
							
							[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )  
						
						... 
						
						
						
						* cuda synch properly waits for all tasks to finish and clear
* fix copy 
						
						
							
						
					 
					
						2025-06-17 12:03:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b8022c578a 
					 
					
						
						
							
							divmod, partition, sort fixes ( #2302 )  
						
						
						
						
							
						
					 
					
						2025-06-16 18:49:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bc53f8293f 
					 
					
						
						
							
							Cuda bug fixes 2 ( #2298 )  
						
						... 
						
						
						
						* more bug fixes
* more bug fixes
* format 
						
						
							
						
					 
					
						2025-06-16 13:14:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c552ff2451 
					 
					
						
						
							
							[CUDA] Fix back-end bugs and enable corresponding tests ( #2296 )  
						
						... 
						
						
						
						* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format 
						
						
							
						
					 
					
						2025-06-16 08:45:40 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4fda5fbdf9 
					 
					
						
						
							
							add python testing for cuda with ability to skip list of tests ( #2295 )  
						
						
						
						
							
						
					 
					
						2025-06-15 10:56:48 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						580776559b 
					 
					
						
						
							
							RoPE for CUDA ( #2293 )  
						
						... 
						
						
						
						* First working CUDA rope
* Fix random 
						
						
							
						
					 
					
						2025-06-15 06:08:07 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a14aaa7c9d 
					 
					
						
						
							
							Fix cuda arg reduce ( #2291 )  
						
						
						
						
							
						
					 
					
						2025-06-14 17:54:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a6d780154f 
					 
					
						
						
							
							fix cuda gemm for bf16 ( #2288 )  
						
						
						
						
							
						
					 
					
						2025-06-13 22:10:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6871e2eeb7 
					 
					
						
						
							
							fix cuda jit ( #2287 )  
						
						
						
						
							
						
					 
					
						2025-06-13 19:21:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8402a2acf4 
					 
					
						
						
							
							Fix complex power and print ( #2286 )  
						
						... 
						
						
						
						* fix complex power and print
* fix complex matmul shape 
						
						
							
						
					 
					
						2025-06-13 11:13:00 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						fddb6933e1 
					 
					
						
						
							
							Collection of refactors  ( #2274 )  
						
						... 
						
						
						
						* Refactor gemv into a function
* Refactor splitk step 1
* Refactor split k axpby
* Rearrange steel_gemm_regular
* Redirect steel_gemm_regular
* Add axpby routing to steel_matmul_regular
* Refactor AddMM step 1
* Redirect steel_gemm
* Update addmm
* Comments and format
* Some cleanup
* Add architecture gen to device
* Update no copy condition in normalization to account for axis size 1 
						
						
							
						
					 
					
						2025-06-13 10:44:56 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						c8b4787e4e 
					 
					
						
						
							
							CUDA backend: indexing ops ( #2277 )  
						
						
						
						
							
						
					 
					
						2025-06-12 21:44:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2188199ff8 
					 
					
						
						
							
							[CUDA] ternary with select op ( #2283 )  
						
						... 
						
						
						
						* cuda ternary with select op
* comment + fix
* fix 
						
						
							
						
					 
					
						2025-06-12 20:24:43 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						aa07429bad 
					 
					
						
						
							
							Fix cuda build ( #2284 )  
						
						
						
						
							
						
					 
					
						2025-06-12 17:48:05 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						918761a25a 
					 
					
						
						
							
							[CUDA] RMSNorm and VJP ( #2280 )  
						
						... 
						
						
						
						* rms norm start
* nit 
						
						
							
						
					 
					
						2025-06-12 17:09:49 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						a4fc671d3e 
					 
					
						
						
							
							CUDA backend: compile ( #2276 )  
						
						... 
						
						
						
						* CUDA backend: compile
* Rename kernels/ to device/ 
						
						
							
						
					 
					
						2025-06-12 17:08:39 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f5f65ef48c 
					 
					
						
						
							
							Make sliceUpdate general ( #2282 )  
						
						... 
						
						
						
						* Make sliceUpdate general
* fix 
						
						
							
						
					 
					
						2025-06-12 16:48:54 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						c2dd81a8aa 
					 
					
						
						
							
							Fix warnings from latest CUDA toolkit ( #2275 )  
						
						
						
						
							
						
					 
					
						2025-06-12 06:03:01 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						d7e680ffe4 
					 
					
						
						
							
							CUDA backend: layernorm ( #2271 )  
						
						
						
						
							
						
					 
					
						2025-06-11 15:48:32 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						c371baf53a 
					 
					
						
						
							
							CUDA backend: softmax ( #2272 )  
						
						
						
						
							
						
					 
					
						2025-06-11 13:55:22 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						ccf78f566c 
					 
					
						
						
							
							CUDA backend: argreduce ( #2270 )  
						
						
						
						
							
						
					 
					
						2025-06-11 13:26:17 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						c9fa68664a 
					 
					
						
						
							
							CUDA backend: reduce ( #2269 )  
						
						
						
						
							
						
					 
					
						2025-06-11 11:22:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c35f4d089a 
					 
					
						
						
							
							start cuda circle config ( #2256 )  
						
						... 
						
						
						
						* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config 
						
						
							
						
					 
					
						2025-06-10 21:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						8590c0941e 
					 
					
						
						
							
							Add load_safe to the general conv loaders ( #2258 )  
						
						
						
						
							
						
					 
					
						2025-06-10 20:58:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						095163b8d1 
					 
					
						
						
							
							Fix building cpp benchmarks on Linux ( #2268 )  
						
						
						
						
							
						
					 
					
						2025-06-10 17:10:24 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						99c33d011d 
					 
					
						
						
							
							rebase + nit ( #2260 )  
						
						... 
						
						
						
						Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-06-10 10:51:51 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						62fecf3e13 
					 
					
						
						
							
							fix conv export ( #2265 )  
						
						
						
						
							
						
					 
					
						2025-06-10 09:34:01 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						7c4eb5d03e 
					 
					
						
						
							
							CUDA backend: random ( #2261 )  
						
						
						
						
							
						
					 
					
						2025-06-10 08:59:56 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						bae9a6b404 
					 
					
						
						
							
							CUDA backend: sort ( #2262 )  
						
						... 
						
						
						
						Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-06-10 08:59:47 -07:00 
						 
				 
			
				
					
						
							
							
								Christopher Fleetwood 
							
						 
					 
					
						
						
							
						
						004c1d8ef2 
					 
					
						
						
							
							Report number of missing parameters ( #2264 )  
						
						... 
						
						
						
						* chore: inform
* chore: format
---------
Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com > 
						
						
							
						
					 
					
						2025-06-10 06:37:50 -07:00