Awni Hannun 
							
						 
					 
					
						
						
							
						
						005e7efa64 
					 
					
						
						
							
							fix mask in sdpa ( #1980 )  
						
						... 
						
						
						
						* fix mask in sdpa
* fix attention mask
* Re-enable routing for array mask
---------
Co-authored-by: Jagrit Digani <digani@apple.com > 
						
						
							
						
					 
					
						2025-03-20 14:53:12 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						b42d13ec84 
					 
					
						
						
							
							Update attention tests to show diff, disable array masks ( #1978 )  
						
						
						
						
							
						
					 
					
						2025-03-20 14:25:38 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						9adcd1a650 
					 
					
						
						
							
							Support fused masking in Attention ( #1924 )  
						
						... 
						
						
						
						* Update API to allow mask='causal' in fast::sdpa
* Add fallback
* Update steel::AttnParams
* Fix typo
* WIP, basic causal
* Update tests
* Update benchmarking
* Update masking loop limits
* Add bool masking and update tests
* Update additive mask
* Update benchmarks
* Update benchmarks
* Update tests
* Update for bfloat error
* Update early exit
* Add random seed to tests 
						
						
							
						
					 
					
						2025-03-20 11:01:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3c164fca8c 
					 
					
						
						
							
							Fix multistream GPU deadlock ( #1969 )  
						
						... 
						
						
						
						* fix multistream GPU deadlock
* comments 
						
						
							
						
					 
					
						2025-03-20 07:19:47 -07:00 
						 
				 
			
				
					
						
							
							
								jiyzhang 
							
						 
					 
					
						
						
							
						
						95e335db7b 
					 
					
						
						
							
							Update smooth_l1_loss in losses.py ( #1974 )  
						
						... 
						
						
						
						According the definition of smooth_l1_loss, the line 
diff = predictions - targets
Should be updated to 
diff = mx.abs(predictions - targets)
After the modification, the result is consistent with PyTorch smooth_l1_loss 
						
						
							
						
					 
					
						2025-03-19 20:19:02 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f90206ad74 
					 
					
						
						
							
							Guard nullptr dereference ( #1972 )  
						
						... 
						
						
						
						* guard nullptr dereference
* comment 
						
						
							
						
					 
					
						2025-03-19 16:24:10 -07:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						3779150750 
					 
					
						
						
							
							refactor: all use schedule ( #1973 )  
						
						
						
						
							
						
					 
					
						2025-03-19 11:24:04 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						0a9777aa5c 
					 
					
						
						
							
							Do not define MLX_VERSION globally ( #1966 )  
						
						
						
						
							
						
					 
					
						2025-03-18 07:12:40 -07:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						45ad06aac8 
					 
					
						
						
							
							Fix typo; Fix lint warning when reuse the same name ( #1968 )  
						
						... 
						
						
						
						* Fix typo; Fix lint warning when reuse the same name
* Add missing period 
						
						
							
						
					 
					
						2025-03-18 07:12:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c6ea2ba329 
					 
					
						
						
							
							Use same accumulation precision in gemv as gemm ( #1962 )  
						
						... 
						
						
						
						* use same accumulation precision in gemv as gemm
* faster
* fix compile 
						
						
							
						
					 
					
						2025-03-16 07:13:24 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2770a10240 
					 
					
						
						
							
							fix grad with inplace updates ( #1961 )  
						
						
						
						
							
						
					 
					
						2025-03-13 19:13:09 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d2a94f9e6a 
					 
					
						
						
							
							Only compile warnings as errors for circle ( #1957 )  
						
						
						
						
							
						
					 
					
						2025-03-12 13:08:19 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						32da94507a 
					 
					
						
						
							
							fix vmap for flatten ( #1955 )  
						
						
						
						
							
						
					 
					
						2025-03-11 10:42:22 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						736a340478 
					 
					
						
						
							
							reduce binary size ( #1952 )  
						
						
						
						
							
						
					 
					
						2025-03-11 06:30:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						117e1355a2 
					 
					
						
						
							
							fix copy for large arrays ( #1953 )  
						
						
						
						
							
						
					 
					
						2025-03-10 15:04:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3c3e558c60 
					 
					
						
						
							
							Support transposed head/seq for kv ( #1950 )  
						
						... 
						
						
						
						* support transposed head/seq for kv
* fix flaky test
* nit 
						
						
							
						
					 
					
						2025-03-10 10:53:45 -07:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						cffceda6ee 
					 
					
						
						
							
							Add type hint for _extra_repr ( #1948 )  
						
						
						
						
							
						
					 
					
						2025-03-10 06:05:36 -07:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						048805ad2c 
					 
					
						
						
							
							Remove unused modules ( #1949 )  
						
						
						
						
							
						
					 
					
						2025-03-10 06:05:26 -07:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						d14c9fe7ea 
					 
					
						
						
							
							Add file info when raising errors in save ( #1943 )  
						
						
						
						
							
						
					 
					
						2025-03-08 14:51:04 -08:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						5db90ce822 
					 
					
						
						
							
							Fix obsured warning ( #1944 )  
						
						
						
						
							
						
					 
					
						2025-03-08 14:50:39 -08:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						d699cc1330 
					 
					
						
						
							
							Fix unreachable warning ( #1939 )  
						
						... 
						
						
						
						* Fix unreachable warning
* Update error message 
						
						
							
						
					 
					
						2025-03-07 17:23:04 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c4230747a1 
					 
					
						
						
							
							redesign for faster cpu/gpu synch ( #1869 )  
						
						... 
						
						
						
						* redesign for faster cpu/gpu synch
* load + more async CPU
* use command encoder API and move more ops to use it
* make fence back-end generic + CPU only fence
* faster build
* fix async eval
* fixes + handle temporaries
* fix / improve cpu conv
* remove unused status, fix siblings
* fix extensions
* fix
* fix no cpu build
* format
* comments
* fix perf regression, remove unecessary abort
* fix events, task limit cpu
* fix waiting
* fix donation / temporaries in normalization 
						
						
							
						
					 
					
						2025-03-06 19:23:38 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5245f12a46 
					 
					
						
						
							
							always use json ( #1938 )  
						
						
						
						
							
						
					 
					
						2025-03-06 15:35:56 -08:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						a198b2787e 
					 
					
						
						
							
							Remove unused modules ( #1936 )  
						
						
						
						
							
						
					 
					
						2025-03-06 14:20:27 -08:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						04edad8c59 
					 
					
						
						
							
							Add doc string for path ( #1937 )  
						
						
						
						
							
						
					 
					
						2025-03-06 14:20:09 -08:00 
						 
				 
			
				
					
						
							
							
								David Wisdom 
							
						 
					 
					
						
						
							
						
						392b3060b0 
					 
					
						
						
							
							Fix typo in randint docstring ( #1932 )  
						
						... 
						
						
						
						This commit fixes a typo in the docstring for mlx.core.random.randint() by changing "roadcastable" to "broadcastable". 
						
						
							
						
					 
					
						2025-03-05 21:48:00 -08:00 
						 
				 
			
				
					
						
							
							
								Chunyang Wen 
							
						 
					 
					
						
						
							
						
						85b34d59bc 
					 
					
						
						
							
							Clean unused sys ( #1929 )  
						
						
						
						
							
						
					 
					
						2025-03-05 13:48:03 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f599c11bc8 
					 
					
						
						
							
							bump ( #1931 )  
						
						
						
						
							
 
						
					 
					
						2025-03-05 13:16:53 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0792ff02ff 
					 
					
						
						
							
							Only fail when 10 consecutive socket errors occur ( #1928 )  
						
						
						
						
							
						
					 
					
						2025-03-05 13:16:19 -08:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						fd0d63ba5b 
					 
					
						
						
							
							Affine quant always in fp32 ( #1925 )  
						
						... 
						
						
						
						* do affine quant in fp32
* static cast 
						
						
							
						
					 
					
						2025-03-04 17:50:19 -08:00 
						 
				 
			
				
					
						
							
							
								Abe Leininger 
							
						 
					 
					
						
						
							
						
						3835a428c5 
					 
					
						
						
							
							Adds nuclear norm support ( #1894 )  
						
						... 
						
						
						
						* adjust norm unit test tolerance 
						
						
							
						
					 
					
						2025-03-04 13:26:02 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						9680f72cca 
					 
					
						
						
							
							Add a multi optimizer ( #1916 )  
						
						
						
						
							
						
					 
					
						2025-03-04 13:16:35 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						a0737273d3 
					 
					
						
						
							
							Allow debugging in distributed mode ( #1920 )  
						
						
						
						
							
						
					 
					
						2025-03-04 13:01:10 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e613d0eaf0 
					 
					
						
						
							
							SDPA support for small batch (over sequence) queries ( #1922 )  
						
						... 
						
						
						
						* batch query sdpa
* batch sdpa for query 
						
						
							
						
					 
					
						2025-03-04 10:59:04 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6bcd6bcf70 
					 
					
						
						
							
							fix donation in scan ( #1917 )  
						
						
						
						
							
						
					 
					
						2025-03-03 11:30:59 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ba12e4999a 
					 
					
						
						
							
							Use a heap for small sizes ( #1911 )  
						
						... 
						
						
						
						* use a heap for small sizes
* check if VM 
						
						
							
						
					 
					
						2025-03-03 06:50:57 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4e7cd31d12 
					 
					
						
						
							
							Fix slice data size ( #1913 )  
						
						... 
						
						
						
						* fix slice data size
* add test 
						
						
							
						
					 
					
						2025-03-02 21:50:42 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5e6c130d93 
					 
					
						
						
							
							RMS norm without scaling ( #1915 )  
						
						
						
						
							
						
					 
					
						2025-02-28 20:26:57 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5d68082881 
					 
					
						
						
							
							Ring docs ( #1829 )  
						
						
						
						
							
						
					 
					
						2025-02-28 11:34:21 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						607181644f 
					 
					
						
						
							
							Add mlx.distributed_config script ( #1902 )  
						
						
						
						
							
						
					 
					
						2025-02-28 11:16:39 -08:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						89d327075f 
					 
					
						
						
							
							Enabling fused attention for head dim 128 ( #1899 )  
						
						... 
						
						
						
						* Share KV smem
* Fix bfloat error
* Unroll O = S @ V loop
* Perf upgrade
* Remove commented out function
* Add -Wno-c++17-extensions flag to metal flags
* Add -Wno-c++17-extensions flag to metal extension flags 
						
						
							
						
					 
					
						2025-02-26 10:02:06 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						6bf00ef631 
					 
					
						
						
							
							Fix ring of 2 and allow scalars in API ( #1906 )  
						
						
						
						
							
						
					 
					
						2025-02-25 17:03:01 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7d042f17fe 
					 
					
						
						
							
							Double for lapack ( #1904 )  
						
						... 
						
						
						
						* double for lapack ops
* add double support for lapack ops 
						
						
							
						
					 
					
						2025-02-25 11:39:36 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						28b8079e30 
					 
					
						
						
							
							fix double type promotion ( #1901 )  
						
						
						
						
							
						
					 
					
						2025-02-25 06:00:53 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7face5d9fd 
					 
					
						
						
							
							fix cpu compile ( #1897 )  
						
						
						
						
							
						
					 
					
						2025-02-24 14:10:30 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a44dc4bdb0 
					 
					
						
						
							
							fix leaking objc ( #1898 )  
						
						
						
						
							
						
					 
					
						2025-02-24 13:57:59 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2d0f384b6f 
					 
					
						
						
							
							fix simd erf_inv ( #1896 )  
						
						
						
						
							
						
					 
					
						2025-02-24 13:57:47 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8ff84b5c43 
					 
					
						
						
							
							fix version and expose command queue getter ( #1892 )  
						
						
						
						
							
						
					 
					
						2025-02-20 15:25:15 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						10b271d963 
					 
					
						
						
							
							Ring update ( #1885 )  
						
						
						
						
							
						
					 
					
						2025-02-20 14:32:31 -08:00 
						 
				 
			
				
					
						
							
							
								Jesper Stemann Andersen 
							
						 
					 
					
						
						
							
						
						0ebc8a3d25 
					 
					
						
						
							
							Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp ( #1890 )  
						
						
						
						
							
						
					 
					
						2025-02-20 12:02:12 -08:00