Artur Antonov 
							
						 
					 
					
						
						
							
						
						c5460762e7 
					 
					
						
						
							
							Fix AdamW weight_decay default value in docstring ( #2557 )  
						
						
						
						
					 
					
						2025-08-31 21:29:30 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8ce49cd39e 
					 
					
						
						
							
							fix quantized vjp for mxfp4 ( #2555 )  
						
						
						
						
					 
					
						2025-08-29 10:06:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						111f1e71af 
					 
					
						
						
							
							Faster contiguous gather for indices in the first axis ( #2552 )  
						
						... 
						
						
						
						* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases 
						
						
					 
					
						2025-08-28 21:26:30 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						70560b6bd5 
					 
					
						
						
							
							Add mode parameter for quantization ( #2499 )  
						
						... 
						
						
						
						* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum 
						
						
					 
					
						2025-08-28 06:45:26 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7ef8a6f2d5 
					 
					
						
						
							
							[CUDA] fix sort ( #2550 )  
						
						... 
						
						
						
						* [CUDA] fix sort
* fix test 
						
						
					 
					
						2025-08-27 19:48:43 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5458d43247 
					 
					
						
						
							
							add load with path tests ( #2543 )  
						
						
						
						
					 
					
						2025-08-26 14:24:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3dcb286baf 
					 
					
						
						
							
							Remove stream from average grads so it uses default ( #2532 )  
						
						... 
						
						
						
						* Remove stream from average grads so it uses default
* comment 
						
						
					 
					
						2025-08-25 15:56:29 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						4822c3dbe9 
					 
					
						
						
							
							[CUDA] Implement DynamicSlice/DynamicSliceUpdate ( #2533 )  
						
						... 
						
						
						
						* Move DynamicSlice to gpu/primitives
* Implement compute_dynamic_offset in CUDA 
						
						
					 
					
						2025-08-26 07:31:39 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						db14e29a0b 
					 
					
						
						
							
							allow pathlib.Path to save/load functions ( #2541 )  
						
						
						
						
					 
					
						2025-08-25 14:58:49 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						068a4612e9 
					 
					
						
						
							
							nccl default for backend=any ( #2528 )  
						
						... 
						
						
						
						* nccl default for backend=any
* check num gpus + ensure row contiguous for all reduce
* comment 
						
						
					 
					
						2025-08-22 12:24:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f93f87c802 
					 
					
						
						
							
							nccl dep + default for cuda ( #2526 )  
						
						
						
						
					 
					
						2025-08-21 17:57:49 -07:00 
						 
				 
			
				
					
						
							
							
								Anastasiia Filippova 
							
						 
					 
					
						
						
							
						
						9392fc3f88 
					 
					
						
						
							
							NCCL backend ( #2476 )  
						
						
						
						
					 
					
						2025-08-21 11:56:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e843c4d8d5 
					 
					
						
						
							
							fix power ( #2523 )  
						
						
						
						
					 
					
						2025-08-21 06:46:01 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e397177f6e 
					 
					
						
						
							
							Custom cuda kernel ( #2517 )  
						
						
						
						
					 
					
						2025-08-20 17:20:22 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						f4c8888cbe 
					 
					
						
						
							
							[CUDA] Fix stride of singleton dims before passing to cuDNN ( #2521 )  
						
						
						
						
					 
					
						2025-08-21 08:55:26 +09:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						25c1e03205 
					 
					
						
						
							
							Fix overflow in large filter small channels ( #2520 )  
						
						
						
						
					 
					
						2025-08-20 08:03:29 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						ac85ddfdb7 
					 
					
						
						
							
							[CUDA] Add GEMM-based fallback convolution kernels ( #2511 )  
						
						... 
						
						
						
						* Add gemm_conv
* Add gemm_grouped_conv 
						
						
					 
					
						2025-08-20 10:06:22 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7c6e1db82 
					 
					
						
						
							
							no segfault with uninitialized array.at ( #2514 )  
						
						
						
						
					 
					
						2025-08-18 08:33:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c5fcd5b61b 
					 
					
						
						
							
							fix custom kernel test ( #2510 )  
						
						
						
						
					 
					
						2025-08-18 06:45:59 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						1ba18ff7d9 
					 
					
						
						
							
							[CUDA] Fix conv grads with groups ( #2495 )  
						
						... 
						
						
						
						* Put reshape utils in one file
* [CUDA] Fix conv grads with groups
* Put the reshape utils in gpu/copy.h 
						
						
					 
					
						2025-08-16 10:09:18 +09:00 
						 
				 
			
				
					
						
							
							
								Luca Vivona 
							
						 
					 
					
						
						
							
						
						728d4db582 
					 
					
						
						
							
							Support destination arg in tree flatten/unflatten ( #2450 )  
						
						
						
						
					 
					
						2025-08-06 15:34:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fa89f0b150 
					 
					
						
						
							
							faster gather qmm sorted test ( #2463 )  
						
						
						
						
					 
					
						2025-08-05 06:27:40 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						828c5f1137 
					 
					
						
						
							
							Use SmallVector for shapes and strides ( #2454 )  
						
						... 
						
						
						
						* Use SmallVector for shapes and strides
* Convert SmallVector to tuple 
						
						
					 
					
						2025-08-05 09:41:03 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						0b807893a7 
					 
					
						
						
							
							fix wraps compile ( #2461 )  
						
						
						
						
					 
					
						2025-08-04 16:14:18 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						86c6a15571 
					 
					
						
						
							
							[CUDA] Backward convolution ( #2431 )  
						
						
						
						
					 
					
						2025-08-01 09:54:05 +09:00 
						 
				 
			
				
					
						
							
							
								junpeiz 
							
						 
					 
					
						
						
							
						
						8b25ce62d5 
					 
					
						
						
							
							Add tests for export including control flow models and quantized models ( #2430 )  
						
						... 
						
						
						
						* Add tests for export, including control flow export and quantized model export.
* Skip quantization related test for CUDA backend. 
						
						
					 
					
						2025-07-31 11:06:26 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d32519c8ee 
					 
					
						
						
							
							fix gemv regression ( #2445 )  
						
						
						
						
					 
					
						2025-07-30 14:23:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b405591249 
					 
					
						
						
							
							fix circular reference ( #2443 )  
						
						
						
						
					 
					
						2025-07-30 09:37:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ef631d63af 
					 
					
						
						
							
							faster rms norm ( #2433 )  
						
						
						
						
					 
					
						2025-07-29 13:12:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4ad53414dd 
					 
					
						
						
							
							fix cuda pypi package ( #2423 )  
						
						... 
						
						
						
						* fix cuda pypi package
* patch bump 
						
						
					 
					
						2025-07-25 15:20:29 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dcb8319f3d 
					 
					
						
						
							
							update install docs and requirements ( #2419 )  
						
						
						
						
					 
					
						2025-07-25 12:13:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5597fa089c 
					 
					
						
						
							
							Fix qvm splitk ( #2415 )  
						
						
						
						
					 
					
						2025-07-25 11:50:24 -07:00 
						 
				 
			
				
					
						
							
							
								Skonor 
							
						 
					 
					
						
						
							
						
						7d9d6ef456 
					 
					
						
						
							
							docs: fix adam and adamw eps placement ( #2416 )  
						
						... 
						
						
						
						Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com > 
						
						
					 
					
						2025-07-24 16:40:45 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						6f5874a2f2 
					 
					
						
						
							
							[CUDA] Initial implementation of Convolution with cuDNN ( #2385 )  
						
						... 
						
						
						
						* Link with cuDNN
* Initial implementation
* Remove backend apis
* Fix recording cudnn conv
* More unused backend apis
* Fix C++ conv tests
* include cudnn as python dep
* Install libcudnn9-dev-cuda-12 in CI
* cudnn only accepts contiguous inputs
* Switch to backend apis
* Plan needs to be kept alive
* Turn off tf32
* Add cache
* Test the native cuda graph api
* Set cudnn stream before execution
* Make LRUCache more like a normal container
* Do error check for cublas handle
* Zero-initilizing array
* Use tf32 for conv
* Skip TestConv.test_torch_conv_2D test
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-07-25 08:12:10 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d1f4d291e8 
					 
					
						
						
							
							Fix uv install and add dev release ( #2411 )  
						
						... 
						
						
						
						* fix uv install and add dev release
* fix docstring
* pin cuda deps
* cuda release on cpu-only machine 
						
						
					 
					
						2025-07-23 16:54:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e1840853ce 
					 
					
						
						
							
							full row mask in sdpa consistently gives nan ( #2406 )  
						
						
						
						
					 
					
						2025-07-23 16:37:03 -07:00 
						 
				 
			
				
					
						
							
							
								Fangjun Kuang 
							
						 
					 
					
						
						
							
						
						28d068bce6 
					 
					
						
						
							
							Fix an error in the comment for mx.dequantize ( #2409 )  
						
						
						
						
					 
					
						2025-07-23 06:10:50 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						63f663d9c6 
					 
					
						
						
							
							fix cuda manylinux version to match others ( #2388 )  
						
						
						
						
					 
					
						2025-07-18 21:02:16 -07:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						deee214a95 
					 
					
						
						
							
							Adding support for the Muon Optimizer ( #1914 )  
						
						... 
						
						
						
						* initial commit with workong optmimizer
* update ACKNOWLEDGMENTS.md
* nits and adding it to test
* nits
* G.astype(mx.bfloat16) to G.astype(G.dtype)
* G.ndim >= 2 to assert G.ndim == 2
* remove coments
* replace with  mx.addmm
* remove comments
* format
* nits
* match muon
* fix addmm
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-07-18 12:25:28 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f409b229a4 
					 
					
						
						
							
							fix ring distributed test ( #2380 )  
						
						
						
						
					 
					
						2025-07-16 11:25:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d7734edd9f 
					 
					
						
						
							
							fix complex reduce + nan propagation in min and max ( #2377 )  
						
						
						
						
					 
					
						2025-07-15 18:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f0a0b077a0 
					 
					
						
						
							
							Install linux with mlx[cuda] and mlx[cpu] ( #2356 )  
						
						... 
						
						
						
						* install linux with mlx[cuda] and mlx[cpu]
* temp for testing
* cleanup circle, fix cuda repair
* update circle
* update circle
* decouple python bindings from core libraries 
						
						
					 
					
						2025-07-14 17:17:33 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						49114f28ab 
					 
					
						
						
							
							fix flaky test ( #2371 )  
						
						
						
						
					 
					
						2025-07-14 17:16:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7d2ebadd2 
					 
					
						
						
							
							[CUDA] Affine quantize ( #2354 )  
						
						... 
						
						
						
						* affine quantize and dequantize kernels
* format
* fix
* format 
						
						
					 
					
						2025-07-14 15:45:44 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						d34f887abc 
					 
					
						
						
							
							Add Primitive::name and remove Primitive::print ( #2365 )  
						
						
						
						
					 
					
						2025-07-14 14:06:35 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5201df5030 
					 
					
						
						
							
							Fix imag() vjp ( #2367 )  
						
						
						
						
					 
					
						2025-07-14 13:11:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						8347575ba1 
					 
					
						
						
							
							[CUDA] Implement Scan kernel ( #2347 )  
						
						... 
						
						
						
						* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal 
						
						
					 
					
						2025-07-10 16:54:12 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0eb035b4b1 
					 
					
						
						
							
							Fix type promotion in Adam with bias correction ( #2350 )  
						
						
						
						
					 
					
						2025-07-10 11:14:42 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8c7bc30ce4 
					 
					
						
						
							
							Align mlx::core::min op nan propagation with NumPy ( #2346 )  
						
						
						
						
					 
					
						2025-07-10 06:20:43 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8b9a3f3cea 
					 
					
						
						
							
							Align mlx::core::max op nan propagation with NumPy ( #2339 )  
						
						... 
						
						
						
						* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR. 
						
						
					 
					
						2025-07-09 11:26:27 -07:00