Awni Hannun 
							
						 
					 
					
						
						
							
						
						fa89f0b150 
					 
					
						
						
							
							faster gather qmm sorted test ( #2463 )  
						
						
						
						
					 
					
						2025-08-05 06:27:40 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						828c5f1137 
					 
					
						
						
							
							Use SmallVector for shapes and strides ( #2454 )  
						
						... 
						
						
						
						* Use SmallVector for shapes and strides
* Convert SmallVector to tuple 
						
						
					 
					
						2025-08-05 09:41:03 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						0b807893a7 
					 
					
						
						
							
							fix wraps compile ( #2461 )  
						
						
						
						
					 
					
						2025-08-04 16:14:18 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						86c6a15571 
					 
					
						
						
							
							[CUDA] Backward convolution ( #2431 )  
						
						
						
						
					 
					
						2025-08-01 09:54:05 +09:00 
						 
				 
			
				
					
						
							
							
								junpeiz 
							
						 
					 
					
						
						
							
						
						8b25ce62d5 
					 
					
						
						
							
							Add tests for export including control flow models and quantized models ( #2430 )  
						
						... 
						
						
						
						* Add tests for export, including control flow export and quantized model export.
* Skip quantization related test for CUDA backend. 
						
						
					 
					
						2025-07-31 11:06:26 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d32519c8ee 
					 
					
						
						
							
							fix gemv regression ( #2445 )  
						
						
						
						
					 
					
						2025-07-30 14:23:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b405591249 
					 
					
						
						
							
							fix circular reference ( #2443 )  
						
						
						
						
					 
					
						2025-07-30 09:37:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ef631d63af 
					 
					
						
						
							
							faster rms norm ( #2433 )  
						
						
						
						
					 
					
						2025-07-29 13:12:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4ad53414dd 
					 
					
						
						
							
							fix cuda pypi package ( #2423 )  
						
						... 
						
						
						
						* fix cuda pypi package
* patch bump 
						
						
					 
					
						2025-07-25 15:20:29 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dcb8319f3d 
					 
					
						
						
							
							update install docs and requirements ( #2419 )  
						
						
						
						
					 
					
						2025-07-25 12:13:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5597fa089c 
					 
					
						
						
							
							Fix qvm splitk ( #2415 )  
						
						
						
						
					 
					
						2025-07-25 11:50:24 -07:00 
						 
				 
			
				
					
						
							
							
								Skonor 
							
						 
					 
					
						
						
							
						
						7d9d6ef456 
					 
					
						
						
							
							docs: fix adam and adamw eps placement ( #2416 )  
						
						... 
						
						
						
						Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com > 
						
						
					 
					
						2025-07-24 16:40:45 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						6f5874a2f2 
					 
					
						
						
							
							[CUDA] Initial implementation of Convolution with cuDNN ( #2385 )  
						
						... 
						
						
						
						* Link with cuDNN
* Initial implementation
* Remove backend apis
* Fix recording cudnn conv
* More unused backend apis
* Fix C++ conv tests
* include cudnn as python dep
* Install libcudnn9-dev-cuda-12 in CI
* cudnn only accepts contiguous inputs
* Switch to backend apis
* Plan needs to be kept alive
* Turn off tf32
* Add cache
* Test the native cuda graph api
* Set cudnn stream before execution
* Make LRUCache more like a normal container
* Do error check for cublas handle
* Zero-initilizing array
* Use tf32 for conv
* Skip TestConv.test_torch_conv_2D test
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-07-25 08:12:10 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d1f4d291e8 
					 
					
						
						
							
							Fix uv install and add dev release ( #2411 )  
						
						... 
						
						
						
						* fix uv install and add dev release
* fix docstring
* pin cuda deps
* cuda release on cpu-only machine 
						
						
					 
					
						2025-07-23 16:54:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e1840853ce 
					 
					
						
						
							
							full row mask in sdpa consistently gives nan ( #2406 )  
						
						
						
						
					 
					
						2025-07-23 16:37:03 -07:00 
						 
				 
			
				
					
						
							
							
								Fangjun Kuang 
							
						 
					 
					
						
						
							
						
						28d068bce6 
					 
					
						
						
							
							Fix an error in the comment for mx.dequantize ( #2409 )  
						
						
						
						
					 
					
						2025-07-23 06:10:50 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						63f663d9c6 
					 
					
						
						
							
							fix cuda manylinux version to match others ( #2388 )  
						
						
						
						
					 
					
						2025-07-18 21:02:16 -07:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						deee214a95 
					 
					
						
						
							
							Adding support for the Muon Optimizer ( #1914 )  
						
						... 
						
						
						
						* initial commit with workong optmimizer
* update ACKNOWLEDGMENTS.md
* nits and adding it to test
* nits
* G.astype(mx.bfloat16) to G.astype(G.dtype)
* G.ndim >= 2 to assert G.ndim == 2
* remove coments
* replace with  mx.addmm
* remove comments
* format
* nits
* match muon
* fix addmm
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-07-18 12:25:28 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f409b229a4 
					 
					
						
						
							
							fix ring distributed test ( #2380 )  
						
						
						
						
					 
					
						2025-07-16 11:25:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d7734edd9f 
					 
					
						
						
							
							fix complex reduce + nan propagation in min and max ( #2377 )  
						
						
						
						
					 
					
						2025-07-15 18:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f0a0b077a0 
					 
					
						
						
							
							Install linux with mlx[cuda] and mlx[cpu] ( #2356 )  
						
						... 
						
						
						
						* install linux with mlx[cuda] and mlx[cpu]
* temp for testing
* cleanup circle, fix cuda repair
* update circle
* update circle
* decouple python bindings from core libraries 
						
						
					 
					
						2025-07-14 17:17:33 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						49114f28ab 
					 
					
						
						
							
							fix flaky test ( #2371 )  
						
						
						
						
					 
					
						2025-07-14 17:16:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7d2ebadd2 
					 
					
						
						
							
							[CUDA] Affine quantize ( #2354 )  
						
						... 
						
						
						
						* affine quantize and dequantize kernels
* format
* fix
* format 
						
						
					 
					
						2025-07-14 15:45:44 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						d34f887abc 
					 
					
						
						
							
							Add Primitive::name and remove Primitive::print ( #2365 )  
						
						
						
						
					 
					
						2025-07-14 14:06:35 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5201df5030 
					 
					
						
						
							
							Fix imag() vjp ( #2367 )  
						
						
						
						
					 
					
						2025-07-14 13:11:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						8347575ba1 
					 
					
						
						
							
							[CUDA] Implement Scan kernel ( #2347 )  
						
						... 
						
						
						
						* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal 
						
						
					 
					
						2025-07-10 16:54:12 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0eb035b4b1 
					 
					
						
						
							
							Fix type promotion in Adam with bias correction ( #2350 )  
						
						
						
						
					 
					
						2025-07-10 11:14:42 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8c7bc30ce4 
					 
					
						
						
							
							Align mlx::core::min op nan propagation with NumPy ( #2346 )  
						
						
						
						
					 
					
						2025-07-10 06:20:43 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8b9a3f3cea 
					 
					
						
						
							
							Align mlx::core::max op nan propagation with NumPy ( #2339 )  
						
						... 
						
						
						
						* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR. 
						
						
					 
					
						2025-07-09 11:26:27 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						4a9b29a875 
					 
					
						
						
							
							MoE backward improvements ( #2335 )  
						
						
						
						
					 
					
						2025-07-07 17:59:53 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a4fcc893cd 
					 
					
						
						
							
							auto build linux release ( #2341 )  
						
						
						
						
					 
					
						2025-07-07 09:29:23 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						19facd4b20 
					 
					
						
						
							
							Build with all cpu cores by default ( #2336 )  
						
						
						
						
					 
					
						2025-07-07 06:06:45 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec0d5db67b 
					 
					
						
						
							
							[CUDA] Switch to CUDA graphs ( #2317 )  
						
						... 
						
						
						
						* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment 
						
						
					 
					
						2025-07-02 15:59:13 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cfb6a244ea 
					 
					
						
						
							
							allow parameters to be deleted ( #2325 )  
						
						
						
						
					 
					
						2025-07-01 21:27:23 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dd4f53db63 
					 
					
						
						
							
							use fp32 for testing, add more complex ops ( #2322 )  
						
						
						
						
					 
					
						2025-07-01 07:30:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						33bf1a244b 
					 
					
						
						
							
							Fix module update in strict mode ( #2321 )  
						
						... 
						
						
						
						* fix module update in strict mode
* allow GELU to be pickled 
						
						
					 
					
						2025-06-29 11:12:29 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						772f471ff2 
					 
					
						
						
							
							[CUDA] Fix reductions ( #2314 )  
						
						
						
						
					 
					
						2025-06-27 12:59:20 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						2c11d10f8d 
					 
					
						
						
							
							Split broadcast so it is always fused in compile ( #2318 )  
						
						
						
						
					 
					
						2025-06-26 22:08:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						81bb9a2a9e 
					 
					
						
						
							
							Compile float64 functions on CPU ( #2311 )  
						
						
						
						
					 
					
						2025-06-24 10:18:52 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5adf185f86 
					 
					
						
						
							
							Fix update_modules() when providing a subset ( #2308 )  
						
						
						
						
					 
					
						2025-06-20 17:19:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						76831ed83d 
					 
					
						
						
							
							Build CUDA release in Circle ( #2306 )  
						
						... 
						
						
						
						* cuda release
* add license 
						
						
					 
					
						2025-06-19 15:26:36 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cad5c0241c 
					 
					
						
						
							
							[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )  
						
						... 
						
						
						
						* cuda synch properly waits for all tasks to finish and clear
* fix copy 
						
						
					 
					
						2025-06-17 12:03:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b8022c578a 
					 
					
						
						
							
							divmod, partition, sort fixes ( #2302 )  
						
						
						
						
					 
					
						2025-06-16 18:49:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bc53f8293f 
					 
					
						
						
							
							Cuda bug fixes 2 ( #2298 )  
						
						... 
						
						
						
						* more bug fixes
* more bug fixes
* format 
						
						
					 
					
						2025-06-16 13:14:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c552ff2451 
					 
					
						
						
							
							[CUDA] Fix back-end bugs and enable corresponding tests ( #2296 )  
						
						... 
						
						
						
						* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format 
						
						
					 
					
						2025-06-16 08:45:40 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4fda5fbdf9 
					 
					
						
						
							
							add python testing for cuda with ability to skip list of tests ( #2295 )  
						
						
						
						
					 
					
						2025-06-15 10:56:48 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8402a2acf4 
					 
					
						
						
							
							Fix complex power and print ( #2286 )  
						
						... 
						
						
						
						* fix complex power and print
* fix complex matmul shape 
						
						
					 
					
						2025-06-13 11:13:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c35f4d089a 
					 
					
						
						
							
							start cuda circle config ( #2256 )  
						
						... 
						
						
						
						* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config 
						
						
					 
					
						2025-06-10 21:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						8590c0941e 
					 
					
						
						
							
							Add load_safe to the general conv loaders ( #2258 )  
						
						
						
						
					 
					
						2025-06-10 20:58:16 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						62fecf3e13 
					 
					
						
						
							
							fix conv export ( #2265 )  
						
						
						
						
					 
					
						2025-06-10 09:34:01 -07:00