Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e397177f6e 
					 
					
						
						
							
							Custom cuda kernel ( #2517 )  
						
						 
						
						
						
						
					 
					
						2025-08-20 17:20:22 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c5fcd5b61b 
					 
					
						
						
							
							fix custom kernel test ( #2510 )  
						
						 
						
						
						
						
					 
					
						2025-08-18 06:45:59 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4fda5fbdf9 
					 
					
						
						
							
							add python testing for cuda with ability to skip list of tests ( #2295 )  
						
						 
						
						
						
						
					 
					
						2025-06-15 10:56:48 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1ca616844b 
					 
					
						
						
							
							Fix unintuitive metal kernel caching ( #2242 )  
						
						 
						
						... 
						
						
						
						* Fix unintuitive metal kernel caching
* alternative solution 
						
						
					 
					
						2025-06-06 20:08:15 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5e6c130d93 
					 
					
						
						
							
							RMS norm without scaling ( #1915 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 20:26:57 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						78ba24c37d 
					 
					
						
						
							
							Raise an exception in the rope op if input is integer ( #1884 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 14:43:39 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						121d9a0702 
					 
					
						
						
							
							Fix rope fallback to not upcast ( #1797 )  
						
						 
						
						... 
						
						
						
						* fix rope fallback to not upcast
* Update mlx/fast.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2025-01-26 19:07:21 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						0308e9af71 
					 
					
						
						
							
							Allow offset to be an mx.array for mx.fast.rope ( #1724 )  
						
						 
						
						... 
						
						
						
						* allow offset for rope
* comment 
						
						
					 
					
						2024-12-19 15:51:44 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						c79f6a4a8c 
					 
					
						
						
							
							3 and 6 bit quantization ( #1613 )  
						
						 
						
						... 
						
						
						
						* Support 3 and 6 bit quantization 
						
						
					 
					
						2024-11-22 10:22:13 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						76f275b4df 
					 
					
						
						
							
							error in rms for wrong size ( #1562 )  
						
						 
						
						
						
						
					 
					
						2024-11-04 13:24:02 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3ae6aabe9f 
					 
					
						
						
							
							throw for certain cases of non captured inputs in compile ( #1401 )  
						
						 
						
						
						
						
					 
					
						2024-09-09 14:54:31 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ba3e913c7a 
					 
					
						
						
							
							Simplifications for MLX C ( #1396 )  
						
						 
						
						... 
						
						
						
						* simplifications for MLX C
* use vectors instead of map
* update examples 
						
						
					 
					
						2024-09-06 19:16:50 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7cca1727af 
					 
					
						
						
							
							Fix slice data size ( #1394 )  
						
						 
						
						... 
						
						
						
						* fix slice data size and add tests
* fix contiguous flag
* simplify stride and perform copy for non-contiguous arrays
* fix cpu
* comment 
						
						
					 
					
						2024-09-04 19:10:43 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						1d94ac3f90 
					 
					
						
						
							
							Add optional headers to `mx.fast.metal_kernel` ( #1358 )  
						
						 
						
						
						
						
					 
					
						2024-08-26 21:45:45 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						da8deb2b62 
					 
					
						
						
							
							fix bug with multiple attributes ( #1348 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Alex Barron <abarron22@apple.com > 
						
						
					 
					
						2024-08-23 10:06:15 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						0fd2a1f4b0 
					 
					
						
						
							
							Custom Metal Kernels from Python ( #1325 )  
						
						 
						
						... 
						
						
						
						* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only 
						
						
					 
					
						2024-08-22 13:46:29 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d40e76809f 
					 
					
						
						
							
							Fix rope ( #1340 )  
						
						 
						
						... 
						
						
						
						* add test
* fix rope
* fix test 
						
						
					 
					
						2024-08-20 17:37:52 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bb1b76d9dc 
					 
					
						
						
							
							RoPE with frequencies as optional input ( #1337 )  
						
						 
						
						... 
						
						
						
						* start rope with freq input
* rope with frequencies
* nits
* fix bug
* fix bug + test
* cleanup
* optional base 
						
						
					 
					
						2024-08-19 18:30:50 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						780c197f95 
					 
					
						
						
							
							Fix test tolerance and patch bump ( #1315 )  
						
						 
						
						
						
						
					 
					
						2024-08-08 14:51:09 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						c52d1600f0 
					 
					
						
						
							
							Fused Affine Quantize/Dequantize ops ( #1282 )  
						
						 
						
						... 
						
						
						
						* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize 
						
						
					 
					
						2024-07-29 15:11:38 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ae18326533 
					 
					
						
						
							
							No copy command encoder ( #986 )  
						
						 
						
						... 
						
						
						
						* no copy command encoder
* up layer norm test tolerances 
						
						
					 
					
						2024-04-11 21:15:36 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d88d2124b5 
					 
					
						
						
							
							segfaut layer norm grad ( #955 )  
						
						 
						
						
						
						
					 
					
						2024-04-04 10:59:15 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						110d9b149d 
					 
					
						
						
							
							Layer norm grad fix donation bug ( #941 )  
						
						 
						
						... 
						
						
						
						* add layer norm grad test
* Fix donation bug in layernorm vjp
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-01 06:15:50 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						29221fa238 
					 
					
						
						
							
							Implement vjps for some primitives in the fast namespace ( #883 )  
						
						 
						
						... 
						
						
						
						* Implement rope vjp in terms of rope
* RMSNormVJP primitive and kernel
* Add LayerNormVJP primitive and kernel 
						
						
					 
					
						2024-03-26 16:35:34 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						fcda3a0e66 
					 
					
						
						
							
							Increase test tolerance for fast.layer_norm ( #880 )  
						
						 
						
						
						
						
					 
					
						2024-03-22 12:10:27 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						2225374060 
					 
					
						
						
							
							Adds mx.fast.layer_norm ( #870 )  
						
						 
						
						
						
						
					 
					
						2024-03-21 13:55:51 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a54f06b16f 
					 
					
						
						
							
							Fast RMS Norm ( #862 )  
						
						 
						
						... 
						
						
						
						* fast rmsnorm
* no rms gpu
* kernel
* fix shared mem
* looped rms and donation in softmax
* Make the squaring in float32 to avoid underflow
* Fix the default StreamOrDevice for rope and rms_norm in fast
* nits
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-03-21 07:20:54 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ccf1645995 
					 
					
						
						
							
							Custom primitive + RoPE fat op ( #676 )  
						
						 
						
						... 
						
						
						
						* extensions start
* rope custom op
* fix build
* docs + rope benchmark
* fix test
* Add a Metal kernel for RoPE
* Fix position of traditional
* transform tests
* Move rope computation to float and fix tests
* Fix the test and a typo
* change to fast
* fix no metal build
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-02-14 14:04:25 -08:00