Awni Hannun 
							
						 
					 
					
						
						
							
						
						70dc336785 
					 
					
						
						
							
							Test on cuda 12.2 and 12.9 ( #2413 )  
						
						
						
						
							
						
					 
					
						2025-07-24 06:06:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4e504039f5 
					 
					
						
						
							
							[Metal] Release metal events ( #2412 )  
						
						... 
						
						
						
						* release metal events
* fix
* fix 
						
						
							
						
					 
					
						2025-07-23 19:53:42 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d1f4d291e8 
					 
					
						
						
							
							Fix uv install and add dev release ( #2411 )  
						
						... 
						
						
						
						* fix uv install and add dev release
* fix docstring
* pin cuda deps
* cuda release on cpu-only machine 
						
						
							
						
					 
					
						2025-07-23 16:54:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e1840853ce 
					 
					
						
						
							
							full row mask in sdpa consistently gives nan ( #2406 )  
						
						
						
						
							
						
					 
					
						2025-07-23 16:37:03 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						0f5ce173da 
					 
					
						
						
							
							[CUDA] --compress-mode requires CUDA 12.8 ( #2407 )  
						
						
						
						
							
						
					 
					
						2025-07-23 06:11:11 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						588854195f 
					 
					
						
						
							
							Remove unused code in Convolution::vjp ( #2408 )  
						
						
						
						
							
						
					 
					
						2025-07-23 06:11:00 -07:00 
						 
				 
			
				
					
						
							
							
								Fangjun Kuang 
							
						 
					 
					
						
						
							
						
						28d068bce6 
					 
					
						
						
							
							Fix an error in the comment for mx.dequantize ( #2409 )  
						
						
						
						
							
						
					 
					
						2025-07-23 06:10:50 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d107d8d495 
					 
					
						
						
							
							add cuda gemv ( #2400 )  
						
						
						
						
							
						
					 
					
						2025-07-22 08:24:13 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1e496ddb82 
					 
					
						
						
							
							[CUDA] Simplify allocator ( #2392 )  
						
						... 
						
						
						
						* simplify allocator and fixe race with small pool
* Don't use shared event in worker
* use cuda buffer in small pool
* comment
* comment 
						
						
							
						
					 
					
						2025-07-22 08:24:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						74eccbf3fa 
					 
					
						
						
							
							use size option in binary ( #2399 )  
						
						
						
						
							
						
					 
					
						2025-07-22 07:00:53 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						08638223ca 
					 
					
						
						
							
							Fix including stubs in wheel ( #2398 )  
						
						... 
						
						
						
						* fix including stubs in wheel
* fix bool_ 
						
						
							
						
					 
					
						2025-07-22 06:30:17 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						56cc858af9 
					 
					
						
						
							
							Add contiguous_copy_cpu util for copying array ( #2397 )  
						
						
						
						
							
						
					 
					
						2025-07-21 07:30:35 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						f55c4ed1d6 
					 
					
						
						
							
							Remove thrust iterators ( #2396 )  
						
						
						
						
							
						
					 
					
						2025-07-21 07:30:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						93d70419e7 
					 
					
						
						
							
							[CUDA] speedup handling scalars ( #2389 )  
						
						... 
						
						
						
						* speedup scalars in cuda
* comment 
						
						
							
						
					 
					
						2025-07-18 21:47:31 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						63f663d9c6 
					 
					
						
						
							
							fix cuda manylinux version to match others ( #2388 )  
						
						
						
						
							
						
					 
					
						2025-07-18 21:02:16 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						84b4d96efa 
					 
					
						
						
							
							fix release build + patch bump ( #2387 )  
						
						
						
						
							
 
						
					 
					
						2025-07-18 14:47:37 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						aec67f2fa6 
					 
					
						
						
							
							patch bump ( #2386 )  
						
						
						
						
							
						
					 
					
						2025-07-18 12:25:48 -07:00 
						 
				 
			
				
					
						
							
							
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						deee214a95 
					 
					
						
						
							
							Adding support for the Muon Optimizer ( #1914 )  
						
						... 
						
						
						
						* initial commit with workong optmimizer
* update ACKNOWLEDGMENTS.md
* nits and adding it to test
* nits
* G.astype(mx.bfloat16) to G.astype(G.dtype)
* G.ndim >= 2 to assert G.ndim == 2
* remove coments
* replace with  mx.addmm
* remove comments
* format
* nits
* match muon
* fix addmm
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-07-18 12:25:28 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						45adec102c 
					 
					
						
						
							
							Add contiguous_copy_gpu util for copying array ( #2379 )  
						
						
						
						
							
						
					 
					
						2025-07-18 06:44:25 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						31fc530c76 
					 
					
						
						
							
							[CUDA] Add more ways finding CCCL headers in JIT ( #2382 )  
						
						
						
						
							
						
					 
					
						2025-07-17 15:25:34 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fbb3f65a1a 
					 
					
						
						
							
							fix resource leaks in matmul and graph ( #2383 )  
						
						
						
						
							
						
					 
					
						2025-07-17 06:50:15 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						6b1b8ea91b 
					 
					
						
						
							
							[CUDA] Add work per thread to compile ( #2368 )  
						
						
						
						
							
						
					 
					
						2025-07-17 06:47:52 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b2273733ea 
					 
					
						
						
							
							Test with CUDA 12.2 ( #2375 )  
						
						... 
						
						
						
						* Test with CUDA 12.0
* try older image
* fix cpu sort 
						
						
							
						
					 
					
						2025-07-16 13:00:37 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f409b229a4 
					 
					
						
						
							
							fix ring distributed test ( #2380 )  
						
						
						
						
							
						
					 
					
						2025-07-16 11:25:24 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						30571e2326 
					 
					
						
						
							
							Rename the copy util in cpu/copy.h to copy_cpu ( #2378 )  
						
						
						
						
							
						
					 
					
						2025-07-16 07:34:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d7734edd9f 
					 
					
						
						
							
							fix complex reduce + nan propagation in min and max ( #2377 )  
						
						
						
						
							
						
					 
					
						2025-07-15 18:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2ba69bc8fa 
					 
					
						
						
							
							lower memory uniform sampling ( #2361 )  
						
						... 
						
						
						
						* lower memory uniform
* use fp32
* fix 
						
						
							
						
					 
					
						2025-07-15 14:22:07 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						cb349a291c 
					 
					
						
						
							
							[CUDA] Use cuda::std::complex in place of cuComplex ( #2372 )  
						
						
						
						
							
						
					 
					
						2025-07-15 00:36:13 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f0a0b077a0 
					 
					
						
						
							
							Install linux with mlx[cuda] and mlx[cpu] ( #2356 )  
						
						... 
						
						
						
						* install linux with mlx[cuda] and mlx[cpu]
* temp for testing
* cleanup circle, fix cuda repair
* update circle
* update circle
* decouple python bindings from core libraries 
						
						
							
						
					 
					
						2025-07-14 17:17:33 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						49114f28ab 
					 
					
						
						
							
							fix flaky test ( #2371 )  
						
						
						
						
							
						
					 
					
						2025-07-14 17:16:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7d2ebadd2 
					 
					
						
						
							
							[CUDA] Affine quantize ( #2354 )  
						
						... 
						
						
						
						* affine quantize and dequantize kernels
* format
* fix
* format 
						
						
							
						
					 
					
						2025-07-14 15:45:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e569803d7c 
					 
					
						
						
							
							update linux build ( #2370 )  
						
						
						
						
							
						
					 
					
						2025-07-14 15:13:56 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						d34f887abc 
					 
					
						
						
							
							Add Primitive::name and remove Primitive::print ( #2365 )  
						
						
						
						
							
						
					 
					
						2025-07-14 14:06:35 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5201df5030 
					 
					
						
						
							
							Fix imag() vjp ( #2367 )  
						
						
						
						
							
						
					 
					
						2025-07-14 13:11:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						2d3c26c565 
					 
					
						
						
							
							[CUDA] Do not put kernels in annoymous namespace ( #2362 )  
						
						
						
						
							
						
					 
					
						2025-07-12 14:24:45 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						6325f60d52 
					 
					
						
						
							
							[CUDA] Bundle CCCL for JIT compilation ( #2357 )  
						
						... 
						
						
						
						* Ship CCCL for JIT compilation
* Remove cexpf 
						
						
							
						
					 
					
						2025-07-11 18:45:37 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						42cc9cfbc7 
					 
					
						
						
							
							fix copy dispatch ( #2360 )  
						
						
						
						
							
						
					 
					
						2025-07-11 10:59:35 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						8347575ba1 
					 
					
						
						
							
							[CUDA] Implement Scan kernel ( #2347 )  
						
						... 
						
						
						
						* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal 
						
						
							
						
					 
					
						2025-07-10 16:54:12 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						b6eec20260 
					 
					
						
						
							
							Fix edge check in qmm_n QuantizedLoader ( #2355 )  
						
						
						
						
							
						
					 
					
						2025-07-10 16:28:50 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0eb035b4b1 
					 
					
						
						
							
							Fix type promotion in Adam with bias correction ( #2350 )  
						
						
						
						
							
						
					 
					
						2025-07-10 11:14:42 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						afb9817599 
					 
					
						
						
							
							[CUDA] Put version in ptx cache dir path ( #2352 )  
						
						
						
						
							
						
					 
					
						2025-07-10 07:24:21 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						8fb3e7a26c 
					 
					
						
						
							
							[CUDA] Set current device before cudaGraphLaunch ( #2351 )  
						
						
						
						
							
						
					 
					
						2025-07-10 07:24:02 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8c7bc30ce4 
					 
					
						
						
							
							Align mlx::core::min op nan propagation with NumPy ( #2346 )  
						
						
						
						
							
						
					 
					
						2025-07-10 06:20:43 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						85873cb162 
					 
					
						
						
							
							[CUDA] Do vectorized store/load in contiguous elementwise ops ( #2342 )  
						
						... 
						
						
						
						* Do vectorized store/load in unary ops
* Do vectorized store/load in binary_two ops
* Do vectorized store/load in copy ops
* Do vectorized store/load in ternary ops
* Use int32_t for IdxT
* binary => binary_two in binary_two.cu
* Fix tests on large arrays
* Use uint as index type
* Contig uses uint as index and non-contig uses int 
						
						
							
						
					 
					
						2025-07-09 18:48:43 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e14ee12491 
					 
					
						
						
							
							add zero for argsort vjp ( #2345 )  
						
						
						
						
							
						
					 
					
						2025-07-09 14:37:14 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8b9a3f3cea 
					 
					
						
						
							
							Align mlx::core::max op nan propagation with NumPy ( #2339 )  
						
						... 
						
						
						
						* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR. 
						
						
							
						
					 
					
						2025-07-09 11:26:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fb4e8b896b 
					 
					
						
						
							
							patch bump ( #2343 )  
						
						
						
						
							
 
						
					 
					
						2025-07-08 14:26:07 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						2ca533b279 
					 
					
						
						
							
							Fix compilation with CUDA 11 ( #2331 )  
						
						
						
						
							
						
					 
					
						2025-07-07 20:00:43 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						4a9b29a875 
					 
					
						
						
							
							MoE backward improvements ( #2335 )  
						
						
						
						
							
						
					 
					
						2025-07-07 17:59:53 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a4fcc893cd 
					 
					
						
						
							
							auto build linux release ( #2341 )  
						
						
						
						
							
						
					 
					
						2025-07-07 09:29:23 -07:00