Rifur13 
							
						 
					 
					
						
						
							
						
						c4a471c99d 
					 
					
						
						
							
							Add groups to Conv1d ( #948 )  
						
						... 
						
						
						
						* Add conv1d grouped convs on CPU
* Add GPU support
* Parallelize inside metal kernel
* clenaup
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* New unfold kernel + remove unused code
* Remove copy and refactor
* Update vjp and reuse steel gemm
* Fixed groups on cpu
* Fix metal validation
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
					 
					
						2024-04-27 06:24:57 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						2e7c02d5cd 
					 
					
						
						
							
							Metal FFT for powers of 2 up to 2048 ( #915 )  
						
						... 
						
						
						
						* add Metal FFT for powers of 2
* skip GPU test on linux
* fix contiguity bug
* address comments
* Update mlx/backend/metal/fft.cpp
* Update mlx/backend/metal/fft.cpp
* fix bug in synch
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-04-11 21:40:06 -07:00 
						 
				 
			
				
					
						
							
							
								Nripesh Niketan 
							
						 
					 
					
						
						
							
						
						ffff671273 
					 
					
						
						
							
							Update pre-commit hooks ( #984 )  
						
						
						
						
					 
					
						2024-04-11 07:27:53 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						913b19329c 
					 
					
						
						
							
							Add missing && when forwarding args ( #925 )  
						
						... 
						
						
						
						Without the && args would be copied and perfect forwarding won't work. 
						
						
					 
					
						2024-03-29 06:48:29 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						29221fa238 
					 
					
						
						
							
							Implement vjps for some primitives in the fast namespace ( #883 )  
						
						... 
						
						
						
						* Implement rope vjp in terms of rope
* RMSNormVJP primitive and kernel
* Add LayerNormVJP primitive and kernel 
						
						
					 
					
						2024-03-26 16:35:34 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						6ee1112f30 
					 
					
						
						
							
							Fix copy donation and add partial rope ( #881 )  
						
						
						
						
					 
					
						2024-03-22 17:28:26 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						6686e61ca4 
					 
					
						
						
							
							Reduce update ( #783 )  
						
						... 
						
						
						
						* Split reduction files to reduce compile times
* Add small and medium axis size specializations for row reductions
* Add non-row-reduction options for small and med kernels 
						
						
					 
					
						2024-03-04 19:09:51 -08:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						776c3d226d 
					 
					
						
						
							
							Convolution update  ( #651 )  
						
						... 
						
						
						
						* Init steel conv and update Conv primitive
* Update slow CPU implementation to support flipping and input dilation winograd conv routing
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-02-28 20:11:16 -08:00 
						 
				 
			
				
					
						
							
							
								Rifur13 
							
						 
					 
					
						
						
							
						
						126c9869c8 
					 
					
						
						
							
							Implement the 'where' primitive for conditional selection ( #664 )  
						
						
						
						
					 
					
						2024-02-22 15:10:48 -08:00 
						 
				 
			
				
					
						
							
							
								Vijay Krish 
							
						 
					 
					
						
						
							
						
						972d9a3aea 
					 
					
						
						
							
							Up to 10x faster scatter. ( #709 )  
						
						... 
						
						
						
						* Faster scatter.
Add specialization for 1-d index tensors.
* Address review comments.
- Check for row contiguity of index, update tensors
  instead of checking strides.
- Add support for 1d specialization with col contiguous update
  tensor, along with a test.
* Nit1
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Nit2
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
					 
					
						2024-02-21 11:09:30 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5798256fcf 
					 
					
						
						
							
							Shapeless compilation for some graphs ( #687 )  
						
						... 
						
						
						
						* shapeless compilation for some graphs
* update compile benchmark
* default compile a few activations
* buffer donation
* bugfix
* shapeless fix
* update tests to work for cpu and gpu fusion
* test kwargs
* add kwargs to compile
* Recompile when python arguments change
* no compile for tanh
* some constant tests
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-02-19 21:43:54 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ccf1645995 
					 
					
						
						
							
							Custom primitive + RoPE fat op ( #676 )  
						
						... 
						
						
						
						* extensions start
* rope custom op
* fix build
* docs + rope benchmark
* fix test
* Add a Metal kernel for RoPE
* Fix position of traditional
* transform tests
* Move rope computation to float and fix tests
* Fix the test and a typo
* change to fast
* fix no metal build
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-02-14 14:04:25 -08:00 
						 
				 
			
				
					
						
							
							
								Vijay Krish 
							
						 
					 
					
						
						
							
						
						2fdc2462c3 
					 
					
						
						
							
							Faster gather and scatter. ( #682 )  
						
						... 
						
						
						
						Reduce unnecessary integer ops, especially since
there kernels are integer bound.
Increase number of iterations for benchmarks for
better smoothing.
Github Issue #506 
Co-authored-by: Vijay Krishnamoorthy <vijay_krish@apple.com > 
						
						
					 
					
						2024-02-13 17:47:41 -08:00 
						 
				 
			
				
					
						
							
							
								Nripesh Niketan 
							
						 
					 
					
						
						
							
						
						0dbc4c7547 
					 
					
						
						
							
							feat: Update pre-commit-config.yaml ( #667 )  
						
						
						
						
					 
					
						2024-02-11 06:08:20 -08:00 
						 
				 
			
				
					
						
							
							
								Vijay Krish 
							
						 
					 
					
						
						
							
						
						06072601ce 
					 
					
						
						
							
							Scatter optimization : Eliminate 64b integer divide. ( #662 )  
						
						... 
						
						
						
						Launch 2D grid to eliminate divide and mod in device code,
since 64b integer division is very expensive.
Github Issue #506 
Co-authored-by: Vijay Krishnamoorthy <vijay_krish@apple.com > 
						
						
					 
					
						2024-02-10 08:49:51 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e319383ef9 
					 
					
						
						
							
							Faster gather ( #626 )  
						
						... 
						
						
						
						* faster gather
* update copyright 
						
						
					 
					
						2024-02-04 17:25:44 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3c2f192345 
					 
					
						
						
							
							Propagate nans in binary ops ( #579 )  
						
						... 
						
						
						
						* propagate nans in binary ops
* handle empty matmul
* cpu minimum/maximum propagate nan
* benchmark maximum
* add min as well
* throw on negative indices with full
* verbose on linux
* fix matmul for zero K 
						
						
					 
					
						2024-01-29 11:19:38 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						86e0c79467 
					 
					
						
						
							
							remove stale benchmarks ( #527 )  
						
						
						
						
					 
					
						2024-01-22 22:17:58 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7a34e46677 
					 
					
						
						
							
							Quantize with groups of 32 ( #511 )  
						
						... 
						
						
						
						* allow quantize with group sizes of 32
* missing cpu dispatch
* remove print
* Fix qvm for group_size 32
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-01-21 06:19:05 -08:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						78102a47ad 
					 
					
						
						
							
							Update GEMM ( #424 )  
						
						... 
						
						
						
						* Organize and collect metal subroutine templates and elements in `metal/kernels/steel/`
* Update gemm elements for better performance 
* Add split-K specialization for gemm
* Add `addmm` primitive, op and bindings for fused matmul and bias addition 
* Update tests and benchmarks as needed 
						
						
					 
					
						2024-01-17 12:42:39 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						c15fe3e61b 
					 
					
						
						
							
							Allow arbitrary first dimension in quantization kernels. ( #458 )  
						
						... 
						
						
						
						* Allow arbitrary first dim on qmm_t and qmv
* Allow arbitrary first dim on qmm and qvm
* Specialized aligned vs unaligned case
* Add more checks for valid quantizations 
						
						
					 
					
						2024-01-16 00:46:21 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f099ebe535 
					 
					
						
						
							
							Multi output primitives ( #330 )  
						
						... 
						
						
						
						* Multi-output primitives
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
					 
					
						2024-01-08 16:39:08 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e7f5059fe4 
					 
					
						
						
							
							Support for quantized matmul with w and w^T ( #349 )  
						
						... 
						
						
						
						* Add the metal qvm implementation
* Add qmm_n
* Add gradient wrt to input for quantized_matmul 
						
						
					 
					
						2024-01-03 14:22:36 -08:00 
						 
				 
			
				
					
						
							
							
								Josh Soref 
							
						 
					 
					
						
						
							
						
						44c1ce5e6a 
					 
					
						
						
							
							Spelling ( #342 )  
						
						... 
						
						
						
						* spelling: accumulates
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: across
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: additional
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: against
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: among
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: array
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: at least
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: available
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: axes
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: basically
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: bfloat
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: bounds
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: broadcast
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: buffer
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: class
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: coefficients
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: collision
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: combinations
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: committing
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: computation
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: consider
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: constructing
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: conversions
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: correctly
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: corresponding
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: declaration
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: default
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: dependency
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: destination
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: destructor
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: dimensions
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: divided
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: element-wise
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: elements
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: endianness
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: equivalent
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: explicitly
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: github
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: indices
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: irregularly
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: memory
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: metallib
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: negative
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: notable
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: optional
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: otherwise
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: overridden
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: partially
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: partition
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: perform
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: perturbations
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: positively
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: primitive
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: repeat
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: repeats
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: respect
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: respectively
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: result
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: rounding
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: separate
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: skipping
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: structure
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: the
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: transpose
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: unnecessary
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: unneeded
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
* spelling: unsupported
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com >
---------
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com > 
						
						
					 
					
						2024-01-01 21:08:17 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						9e6b8c9f48 
					 
					
						
						
							
							Refactor the reduction kernels ( #277 )  
						
						
						
						
					 
					
						2023-12-24 14:47:57 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						dfa9f4bc58 
					 
					
						
						
							
							An initial quantized matmul implementation ( #205 )  
						
						... 
						
						
						
						* Add quantized matvec
* Add quantized matrix matrix with 2nd matrix transposed
* Add quantized matmul tests
* Add a slow cpu quantized matmul
* Add a slightly faster vectorized cpu version 
						
						
					 
					
						2023-12-18 23:18:57 -08:00 
						 
				 
			
				
					
						
							
							
								Diogo 
							
						 
					 
					
						
						
							
						
						02de234ef0 
					 
					
						
						
							
							Activations LeakyReLU / PReLU / Softplus / Mish ( #109 )  
						
						... 
						
						
						
						* Leaky_relu / prelu / softplus / mish
* added tests
* updated bench
* remove torch refs, add init to PReLU
* added arvix reference to mish
* added missing docs 
						
						
					 
					
						2023-12-11 19:40:57 -08:00 
						 
				 
			
				
					
						
							
							
								Nicholas Santavas 
							
						 
					 
					
						
						
							
						
						f5df47ec6e 
					 
					
						
						
							
							Add Step, ELU, SELU, Swish activation functions ( #117 )  
						
						... 
						
						
						
						* Add Step, ELU, SELU, Swish activation functions
This commit adds the Step, ELU, SELU and Swish activations functions
* add to the docs
* review 
						
						
					 
					
						2023-12-11 17:04:07 -08:00 
						 
				 
			
				
					
						
							
							
								Jason 
							
						 
					 
					
						
						
							
						
						b0cd092b7f 
					 
					
						
						
							
							Added activation functions: leaky_relu relu6 softplus elu celu logsigmoid ( #108 )  
						
						... 
						
						
						
						* added leaky_relu relu6 softplus elu celu logsigmoid
* minor fixes for docstring and benchmark imports
* fixed elu implementation and added tests
* added tests for optional param, changed leaky_relu param to fit pytorch documentation 
						
						
					 
					
						2023-12-10 16:31:38 -08:00 
						 
				 
			
				
					
						
							
							
								Zach Schillaci 
							
						 
					 
					
						
						
							
						
						5b9be57ac3 
					 
					
						
						
							
							Add isort pre-commit and run ( #68 )  
						
						
						
						
					 
					
						2023-12-08 11:31:47 -08:00 
						 
				 
			
				
					
						
							
							
								Yingbo Ma 
							
						 
					 
					
						
						
							
						
						36b245b287 
					 
					
						
						
							
							Fix benchmark example ( #11 )  
						
						
						
						
					 
					
						2023-12-06 07:17:16 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						46a39e5b1f 
					 
					
						
						
							
							copyright + ack  
						
						
						
						
					 
					
						2023-11-30 11:12:53 -08:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						e6306cfee9 
					 
					
						
						
							
							jagrit's commit files  
						
						
						
						
					 
					
						2023-11-29 10:52:08 -08:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						d1f86272a2 
					 
					
						
						
							
							angelos's commit files  
						
						
						
						
					 
					
						2023-11-29 10:42:59 -08:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8ca7f9e8e9 
					 
					
						
						
							
							awni's commit files  
						
						
						
						
					 
					
						2023-11-29 10:30:41 -08:00