Awni Hannun 
							
						 
					 
					
						
						
							
						
						fa89f0b150 
					 
					
						
						
							
							faster gather qmm sorted test ( #2463 )  
						
						 
						
						
						
						
					 
					
						2025-08-05 06:27:40 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5597fa089c 
					 
					
						
						
							
							Fix qvm splitk ( #2415 )  
						
						 
						
						
						
						
					 
					
						2025-07-25 11:50:24 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						4a9b29a875 
					 
					
						
						
							
							MoE backward improvements ( #2335 )  
						
						 
						
						
						
						
					 
					
						2025-07-07 17:59:53 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4fda5fbdf9 
					 
					
						
						
							
							add python testing for cuda with ability to skip list of tests ( #2295 )  
						
						 
						
						
						
						
					 
					
						2025-06-15 10:56:48 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6ef2f67e7f 
					 
					
						
						
							
							5bit quants ( #2226 )  
						
						 
						
						... 
						
						
						
						* 5bit quants
* 5bit quants 
						
						
					 
					
						2025-05-30 12:12:10 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7bb063bcb3 
					 
					
						
						
							
							Enable vjp for quantized scale and bias ( #2129 )  
						
						 
						
						... 
						
						
						
						* Enable vjp for quantized scale and bias
* higher tol 
						
						
					 
					
						2025-04-29 13:03:09 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5de6d94a90 
					 
					
						
						
							
							Gather qmm batched kernel and refactoring of quantized ( #2078 )  
						
						 
						
						
						
						
					 
					
						2025-04-17 13:53:11 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4758c8baa1 
					 
					
						
						
							
							Start to cleanup/unify accelerate and common back-ends (Part 1/N) ( #1777 )  
						
						 
						
						... 
						
						
						
						* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv 
						
						
					 
					
						2025-01-29 14:34:49 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						0c259961ac 
					 
					
						
						
							
							matmul jvps ( #1772 )  
						
						 
						
						
						
						
					 
					
						2025-01-17 10:36:26 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						c7b0300af5 
					 
					
						
						
							
							Fix batched qmv bug ( #1758 )  
						
						 
						
						
						
						
					 
					
						2025-01-09 11:45:57 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						c79f6a4a8c 
					 
					
						
						
							
							3 and 6 bit quantization ( #1613 )  
						
						 
						
						... 
						
						
						
						* Support 3 and 6 bit quantization 
						
						
					 
					
						2024-11-22 10:22:13 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						26be608470 
					 
					
						
						
							
							Add split_k qvm for long context ( #1564 )  
						
						 
						
						... 
						
						
						
						* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance 
						
						
					 
					
						2024-11-05 11:25:19 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						d15fa13daf 
					 
					
						
						
							
							Batched Quantized Matmul + Fast Small QMV ( #1503 )  
						
						 
						
						... 
						
						
						
						* add fast qmv for small dims
* fix test
* batched cpu
* add batched template param
* refactor metal quantized.cpp 
						
						
					 
					
						2024-10-21 16:23:17 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Barron 
							
						 
					 
					
						
						
							
						
						c52d1600f0 
					 
					
						
						
							
							Fused Affine Quantize/Dequantize ops ( #1282 )  
						
						 
						
						... 
						
						
						
						* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize 
						
						
					 
					
						2024-07-29 15:11:38 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d568c7ee36 
					 
					
						
						
							
							Rename block sparse ( #1149 )  
						
						 
						
						... 
						
						
						
						* block_sparse_mm to gather_mm
* rename
* nit
* nit 
						
						
					 
					
						2024-05-22 07:48:34 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e78a6518fa 
					 
					
						
						
							
							Block sparse qmm ( #1124 )  
						
						 
						
						
						
						
					 
					
						2024-05-16 15:24:14 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						17f57df797 
					 
					
						
						
							
							Improvements in the quantizer and dequantization kernel ( #1061 )  
						
						 
						
						
						
						
					 
					
						2024-05-01 18:19:11 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						8db7161c94 
					 
					
						
						
							
							Bug fix in quantize ( #1054 )  
						
						 
						
						
						
						
					 
					
						2024-04-29 20:55:04 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						ec8578d41a 
					 
					
						
						
							
							Fix quantization of all 0s ( #1028 )  
						
						 
						
						
						
						
					 
					
						2024-04-24 00:40:42 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						84d61d27aa 
					 
					
						
						
							
							Make sure 0 is represented in the quantization ( #1016 )  
						
						 
						
						
						
						
					 
					
						2024-04-19 19:47:26 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						039da779d1 
					 
					
						
						
							
							No quant reshape ( #957 )  
						
						 
						
						... 
						
						
						
						* precise option on cpu
* remove print
* remove reshape in quant matmul
* no quant reshape 
						
						
					 
					
						2024-04-04 11:52:12 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5f9ba3019f 
					 
					
						
						
							
							Fix qmm_t for unaligned cases ( #923 )  
						
						 
						
						
						
						
					 
					
						2024-03-28 15:34:57 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						40c108766b 
					 
					
						
						
							
							Quantized matmul fix ( #677 )  
						
						 
						
						... 
						
						
						
						* Fix qmv for small or unaligned matrices
* Fix qmm 
						
						
					 
					
						2024-02-12 18:54:21 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7a34e46677 
					 
					
						
						
							
							Quantize with groups of 32 ( #511 )  
						
						 
						
						... 
						
						
						
						* allow quantize with group sizes of 32
* missing cpu dispatch
* remove print
* Fix qvm for group_size 32
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-01-21 06:19:05 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						c15fe3e61b 
					 
					
						
						
							
							Allow arbitrary first dimension in quantization kernels. ( #458 )  
						
						 
						
						... 
						
						
						
						* Allow arbitrary first dim on qmm_t and qmv
* Allow arbitrary first dim on qmm and qvm
* Specialized aligned vs unaligned case
* Add more checks for valid quantizations 
						
						
					 
					
						2024-01-16 00:46:21 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e7f5059fe4 
					 
					
						
						
							
							Support for quantized matmul with w and w^T ( #349 )  
						
						 
						
						... 
						
						
						
						* Add the metal qvm implementation
* Add qmm_n
* Add gradient wrt to input for quantized_matmul 
						
						
					 
					
						2024-01-03 14:22:36 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						447bc089b9 
					 
					
						
						
							
							Fix tolerance in de-/quantization test ( #295 )  
						
						 
						
						
						
						
					 
					
						2023-12-26 19:21:05 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						b3916cbf2b 
					 
					
						
						
							
							Improve names of quantization arguments ( #235 )  
						
						 
						
						... 
						
						
						
						* Change the default quantization group_size to 64
* Rename groups to group_size and width to bits 
						
						
					 
					
						2023-12-20 16:53:53 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						57fe918cf8 
					 
					
						
						
							
							Adds C++ and nn quantization utilities ( #230 )  
						
						 
						
						... 
						
						
						
						* Add C++ de-/quantize ops
* Add quantize functions to the docs and tests
* Add a QuantizedLinear module 
						
						
					 
					
						2023-12-20 14:17:38 -08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						dfa9f4bc58 
					 
					
						
						
							
							An initial quantized matmul implementation ( #205 )  
						
						 
						
						... 
						
						
						
						* Add quantized matvec
* Add quantized matrix matrix with 2nd matrix transposed
* Add quantized matmul tests
* Add a slow cpu quantized matmul
* Add a slightly faster vectorized cpu version 
						
						
					 
					
						2023-12-18 23:18:57 -08:00