Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7d2ebadd2 
					 
					
						
						
							
							[CUDA] Affine quantize ( #2354 )  
						
						... 
						
						
						
						* affine quantize and dequantize kernels
* format
* fix
* format 
						
						
					 
					
						2025-07-14 15:45:44 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5201df5030 
					 
					
						
						
							
							Fix imag() vjp ( #2367 )  
						
						
						
						
					 
					
						2025-07-14 13:11:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						8347575ba1 
					 
					
						
						
							
							[CUDA] Implement Scan kernel ( #2347 )  
						
						... 
						
						
						
						* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal 
						
						
					 
					
						2025-07-10 16:54:12 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0eb035b4b1 
					 
					
						
						
							
							Fix type promotion in Adam with bias correction ( #2350 )  
						
						
						
						
					 
					
						2025-07-10 11:14:42 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8c7bc30ce4 
					 
					
						
						
							
							Align mlx::core::min op nan propagation with NumPy ( #2346 )  
						
						
						
						
					 
					
						2025-07-10 06:20:43 -07:00 
						 
				 
			
				
					
						
							
							
								jhavukainen 
							
						 
					 
					
						
						
							
						
						8b9a3f3cea 
					 
					
						
						
							
							Align mlx::core::max op nan propagation with NumPy ( #2339 )  
						
						... 
						
						
						
						* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR. 
						
						
					 
					
						2025-07-09 11:26:27 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						4a9b29a875 
					 
					
						
						
							
							MoE backward improvements ( #2335 )  
						
						
						
						
					 
					
						2025-07-07 17:59:53 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec0d5db67b 
					 
					
						
						
							
							[CUDA] Switch to CUDA graphs ( #2317 )  
						
						... 
						
						
						
						* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment 
						
						
					 
					
						2025-07-02 15:59:13 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cfb6a244ea 
					 
					
						
						
							
							allow parameters to be deleted ( #2325 )  
						
						
						
						
					 
					
						2025-07-01 21:27:23 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dd4f53db63 
					 
					
						
						
							
							use fp32 for testing, add more complex ops ( #2322 )  
						
						
						
						
					 
					
						2025-07-01 07:30:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						33bf1a244b 
					 
					
						
						
							
							Fix module update in strict mode ( #2321 )  
						
						... 
						
						
						
						* fix module update in strict mode
* allow GELU to be pickled 
						
						
					 
					
						2025-06-29 11:12:29 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						772f471ff2 
					 
					
						
						
							
							[CUDA] Fix reductions ( #2314 )  
						
						
						
						
					 
					
						2025-06-27 12:59:20 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						2c11d10f8d 
					 
					
						
						
							
							Split broadcast so it is always fused in compile ( #2318 )  
						
						
						
						
					 
					
						2025-06-26 22:08:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						81bb9a2a9e 
					 
					
						
						
							
							Compile float64 functions on CPU ( #2311 )  
						
						
						
						
					 
					
						2025-06-24 10:18:52 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5adf185f86 
					 
					
						
						
							
							Fix update_modules() when providing a subset ( #2308 )  
						
						
						
						
					 
					
						2025-06-20 17:19:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cad5c0241c 
					 
					
						
						
							
							[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )  
						
						... 
						
						
						
						* cuda synch properly waits for all tasks to finish and clear
* fix copy 
						
						
					 
					
						2025-06-17 12:03:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b8022c578a 
					 
					
						
						
							
							divmod, partition, sort fixes ( #2302 )  
						
						
						
						
					 
					
						2025-06-16 18:49:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bc53f8293f 
					 
					
						
						
							
							Cuda bug fixes 2 ( #2298 )  
						
						... 
						
						
						
						* more bug fixes
* more bug fixes
* format 
						
						
					 
					
						2025-06-16 13:14:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c552ff2451 
					 
					
						
						
							
							[CUDA] Fix back-end bugs and enable corresponding tests ( #2296 )  
						
						... 
						
						
						
						* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format 
						
						
					 
					
						2025-06-16 08:45:40 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4fda5fbdf9 
					 
					
						
						
							
							add python testing for cuda with ability to skip list of tests ( #2295 )  
						
						
						
						
					 
					
						2025-06-15 10:56:48 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8402a2acf4 
					 
					
						
						
							
							Fix complex power and print ( #2286 )  
						
						... 
						
						
						
						* fix complex power and print
* fix complex matmul shape 
						
						
					 
					
						2025-06-13 11:13:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c35f4d089a 
					 
					
						
						
							
							start cuda circle config ( #2256 )  
						
						... 
						
						
						
						* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config 
						
						
					 
					
						2025-06-10 21:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						8590c0941e 
					 
					
						
						
							
							Add load_safe to the general conv loaders ( #2258 )  
						
						
						
						
					 
					
						2025-06-10 20:58:16 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						62fecf3e13 
					 
					
						
						
							
							fix conv export ( #2265 )  
						
						
						
						
					 
					
						2025-06-10 09:34:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9ce77798b1 
					 
					
						
						
							
							fix export to work with gather/scatter axis ( #2263 )  
						
						
						
						
					 
					
						2025-06-09 20:37:27 -07:00 
						 
				 
			
				
					
						
							
							
								Emmanuel Ferdman 
							
						 
					 
					
						
						
							
						
						5866b3857b 
					 
					
						
						
							
							Refactor the lu test ( #2250 )  
						
						... 
						
						
						
						Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com > 
						
						
					 
					
						2025-06-07 06:12:08 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1ca616844b 
					 
					
						
						
							
							Fix unintuitive metal kernel caching ( #2242 )  
						
						... 
						
						
						
						* Fix unintuitive metal kernel caching
* alternative solution 
						
						
					 
					
						2025-06-06 20:08:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c763fe1be0 
					 
					
						
						
							
							default strict mode for module update and update_modules ( #2239 )  
						
						
						
						
					 
					
						2025-06-05 15:27:02 -07:00 
						 
				 
			
				
					
						
							
							
								Suryash Malviya 
							
						 
					 
					
						
						
							
						
						0408ba0a76 
					 
					
						
						
							
							Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm  ( #2220 )  
						
						... 
						
						
						
						* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-06-02 15:58:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6ef2f67e7f 
					 
					
						
						
							
							5bit quants ( #2226 )  
						
						... 
						
						
						
						* 5bit quants
* 5bit quants 
						
						
					 
					
						2025-05-30 12:12:10 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0359bf02c9 
					 
					
						
						
							
							Nearest upsample ( #2202 )  
						
						
						
						
					 
					
						2025-05-19 11:23:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8576e6fe36 
					 
					
						
						
							
							fix conv2d bug + faster conv 1d ( #2195 )  
						
						... 
						
						
						
						* fix conv2d bug + faster conv 1d
* revert sort + flaky test 
						
						
					 
					
						2025-05-18 06:05:11 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0654543dcc 
					 
					
						
						
							
							Add complex eigh ( #2191 )  
						
						
						
						
					 
					
						2025-05-18 00:18:43 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						602f43e3d1 
					 
					
						
						
							
							fix conv grad ( #2187 )  
						
						
						
						
					 
					
						2025-05-15 19:20:36 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a2cadb8218 
					 
					
						
						
							
							real and imag properties ( #2189 )  
						
						
						
						
					 
					
						2025-05-15 18:17:50 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c1eb9d05d9 
					 
					
						
						
							
							non-symmetric eig and eigh ( #2188 )  
						
						
						
						
					 
					
						2025-05-15 13:01:44 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						cf6c939e86 
					 
					
						
						
							
							Fix some complex vjps ( #2178 )  
						
						
						
						
					 
					
						2025-05-14 23:37:12 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						130df35e1b 
					 
					
						
						
							
							Add random normal distribution for complex numbers ( #2182 )  
						
						
						
						
					 
					
						2025-05-13 22:43:45 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						3aa9cf3f9e 
					 
					
						
						
							
							Fix put_along_axis for empty arrays ( #2181 )  
						
						
						
						
					 
					
						2025-05-13 14:27:53 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8f3d208dce 
					 
					
						
						
							
							Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )  
						
						... 
						
						
						
						* handle hadamard and addmm on empty inputs
* fix 
						
						
					 
					
						2025-05-12 10:48:57 -07:00 
						 
				 
			
				
					
						
							
							
								ATurker 
							
						 
					 
					
						
						
							
						
						a7fae8a176 
					 
					
						
						
							
							fix: conv_general differences between gpu, cpu ( #2070 )  
						
						... 
						
						
						
						* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2025-05-09 10:26:52 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						af705590ac 
					 
					
						
						
							
							fix batched vector sdpa ( #2152 )  
						
						
						
						
					 
					
						2025-05-05 13:13:03 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						481349495b 
					 
					
						
						
							
							GPU Hadamard for large N ( #1879 )  
						
						
						
						
					 
					
						2025-05-01 17:19:17 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9daa6b003f 
					 
					
						
						
							
							fix shapeless export ( #2148 )  
						
						
						
						
					 
					
						2025-05-01 15:02:02 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						aa5d84f102 
					 
					
						
						
							
							Allow quant layer to be unfrozen ( #2142 )  
						
						
						
						
					 
					
						2025-04-30 09:08:29 -07:00 
						 
				 
			
				
					
						
							
							
								Aashiq Dheeraj 
							
						 
					 
					
						
						
							
						
						bb6565ef14 
					 
					
						
						
							
							add fftshift and ifftshift fft helpers ( #2135 )  
						
						... 
						
						
						
						* add fftshift and ifftshift fft helpers
* address comments
* axes have to be iterable
* fix fp error in roll + add test
---------
Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local > 
						
						
					 
					
						2025-04-29 22:13:45 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7bb063bcb3 
					 
					
						
						
							
							Enable vjp for quantized scale and bias ( #2129 )  
						
						... 
						
						
						
						* Enable vjp for quantized scale and bias
* higher tol 
						
						
					 
					
						2025-04-29 13:03:09 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fbc89e3ced 
					 
					
						
						
							
							fix pinv ( #2110 )  
						
						
						
						
					 
					
						2025-04-23 13:08:28 -07:00 
						 
				 
			
				
					
						
							
							
								Param Thakkar 
							
						 
					 
					
						
						
							
						
						600e87e03c 
					 
					
						
						
							
							Added output_padding parameters in conv_transpose ( #2092 )  
						
						
						
						
					 
					
						2025-04-23 09:26:33 -07:00 
						 
				 
			
				
					
						
							
							
								Hyunsung Lee 
							
						 
					 
					
						
						
							
						
						3836445241 
					 
					
						
						
							
							Add broadcast_shapes in python API ( #2091 )  
						
						
						
						
					 
					
						2025-04-22 18:57:39 -07:00