Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						fcda3a0e66 
					 
					
						
						
							
							Increase test tolerance for fast.layer_norm ( #880 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-22 12:10:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						9663c22fe9 
					 
					
						
						
							
							Do not store iostream in shared_ptr ( #872 )  
						
						 
						
						... 
						
						
						
						There is no need to store iostream in shared_ptr, doing so adds the cost
of a heap allocation. 
						
						
							
						
					 
					
						2024-03-22 06:54:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						f0ae00da12 
					 
					
						
						
							
							Reduce implicit copies in make_array ( #874 )  
						
						 
						
						... 
						
						
						
						1. Move shapes into outputs instead of copying them.
2. Pass primitive by const ref as it is always copied into outputs, which
   removes a copy when calling make_array. 
						
						
							
						
					 
					
						2024-03-22 06:29:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						44390bd3d0 
					 
					
						
						
							
							Bump ( #869 )  
						
						 
						
						... 
						
						
						
						* bump
* fix none in a few ops 
						
						
							
  v0.8.0
 
						
					 
					
						2024-03-21 13:56:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						2225374060 
					 
					
						
						
							
							Adds mx.fast.layer_norm ( #870 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-21 13:55:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								nicolov 
							
						 
					 
					
						
						
							
						
						105d236889 
					 
					
						
						
							
							Add vmap for SVD and inverse ( #849 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-21 13:18:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						53e6a9367c 
					 
					
						
						
							
							Use reshape and transpose for non-overlapping pooling windows ( #867 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-21 10:21:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Chime Ogbuji 
							
						 
					 
					
						
						
							
						
						f5a1582fe8 
					 
					
						
						
							
							Add minimum for cosine decay function ( #859 )  
						
						 
						
						... 
						
						
						
						* Add minimum for cosine decay function
* Update python/mlx/optimizers/schedulers.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
							
						
					 
					
						2024-03-21 07:33:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a54f06b16f 
					 
					
						
						
							
							Fast RMS Norm ( #862 )  
						
						 
						
						... 
						
						
						
						* fast rmsnorm
* no rms gpu
* kernel
* fix shared mem
* looped rms and donation in softmax
* Make the squaring in float32 to avoid underflow
* Fix the default StreamOrDevice for rope and rms_norm in fast
* nits
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
							
						
					 
					
						2024-03-21 07:20:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						4650d94d98 
					 
					
						
						
							
							Add missing && in eval ( #864 )  
						
						 
						
						... 
						
						
						
						Without the && args would be copied and perfect forwarding won't work.
To avoid eval calling itself recursively, the vector version of eval is
changed to take by value instead, which will save a copy of array when a
rvalue is passed. 
						
						
							
						
					 
					
						2024-03-21 06:15:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						a5681ebc52 
					 
					
						
						
							
							Update set item ( #861 )  
						
						 
						
						... 
						
						
						
						* Update mlx_set_item to handle regular slices without expanding
* Refactor ellipsis handling
* Route mlx_set_item to slice_update where possible
* Update mlx_scatter_args_slice
* Don't route to gather if no array indices 
						
						
							
						
					 
					
						2024-03-21 02:48:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						e849b3424a 
					 
					
						
						
							
							Do not use static constexpr in header ( #863 )  
						
						 
						
						... 
						
						
						
						Doing so results in each compilation unit (.cpp file) having its own
copy of the variable, while inline constexpr makes sure there is only
one copy. 
						
						
							
						
					 
					
						2024-03-20 21:28:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						b219d12a6b 
					 
					
						
						
							
							Check edge case handling in row reduce med kernel ( #858 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-20 11:37:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						cec8661113 
					 
					
						
						
							
							Add a SliceUpdate op and primitive ( #850 )  
						
						 
						
						... 
						
						
						
						* Enable copy to work with int64 strides
* Fix uniform buffer indices or copy kernel arguments
* Update utils.h
* Remove manual unrolling of elem to loc loop
* GPU copy updated to handle negative strides
* Add slice update primitive 
						
						
							
						
					 
					
						2024-03-20 10:39:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						73a8c090e0 
					 
					
						
						
							
							Pass shape and inputs by value in array's constructor ( #853 )  
						
						 
						
						... 
						
						
						
						Since the shape and inputs are always saved as copy in ArrayDesc, we can
unify array's constructors to just take the arguments by value.
There are 2 cases:
1. When shape is a lvalue, it will be copied into array's constructor and
   then moved into ArrayDesc's member. So only 1 copy happens.
2. When shape is a rvalue, it will be moved into array's constructor and
   then moved into ArrayDesc's member. So no copy happens.
So having 1 constructor that takes by value is equivalent to having 2
constructors that const reference and rvalue separately. 
						
						
							
						
					 
					
						2024-03-20 07:54:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Md. Rasel Mandol 
							
						 
					 
					
						
						
							
						
						db6796ac61 
					 
					
						
						
							
							simple typo fille ( #848 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-19 06:15:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9a8ee00246 
					 
					
						
						
							
							Switch to nanobind ( #839 )  
						
						 
						
						... 
						
						
						
						* mostly builds
* most tests pass
* fix circle build
* add back buffer protocol
* includes
* fix for py38
* limit to cpu device
* include
* fix stubs
* move signatures for docs
* stubgen + docs fix
* doc for compiled function, comments 
						
						
							
						
					 
					
						2024-03-18 20:12:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						d39ed54f8e 
					 
					
						
						
							
							Some C++ code are not needed ( #841 )  
						
						 
						
						... 
						
						
						
						1. Anonymous namespace means internal linkage, static keyword is not needed.
2. The default constructor of std::shared_ptr initializes the pointer to
   nullptr, you don't need to explicitly set it. 
						
						
							
						
					 
					
						2024-03-18 17:04:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						16546c70d8 
					 
					
						
						
							
							No reshape rope ( #838 )  
						
						 
						
						... 
						
						
						
						* no reshape rope
* no reshape rope 
						
						
							
						
					 
					
						2024-03-18 17:03:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								nicolov 
							
						 
					 
					
						
						
							
						
						eaba55c9bf 
					 
					
						
						
							
							Add matrix inversion primitive ( #822 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-15 06:34:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						19ec023256 
					 
					
						
						
							
							vmap matmul and admm ( #836 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-14 14:38:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						63ab0ab580 
					 
					
						
						
							
							version ( #835 )  
						
						 
						
						
						
						
							
  v0.7.0
 
						
					 
					
						2024-03-14 12:20:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						8dfc376c00 
					 
					
						
						
							
							Strided reduce specialization for small reductions ( #826 )  
						
						 
						
						... 
						
						
						
						* Add small column / general reduction specialization 
						
						
							
						
					 
					
						2024-03-14 09:16:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1efee9db09 
					 
					
						
						
							
							Add types and order in kernel name ( #831 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-13 20:34:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						43abc402d8 
					 
					
						
						
							
							route to fallback ( #828 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-13 19:56:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						3f8b1668c4 
					 
					
						
						
							
							Make reshape faster for row_contiguous cases ( #829 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-13 16:22:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						76c919b4ec 
					 
					
						
						
							
							NumberOfElements for shapeless compile and vmap fixes ( #802 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-13 10:34:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						29d0c10ee5 
					 
					
						
						
							
							Reshape improvement ( #818 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-12 17:54:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						5ad133f8bb 
					 
					
						
						
							
							No copy gems ( #801 )  
						
						 
						
						... 
						
						
						
						* Enable collapsing batch dims in gemm
* Update gemm to only make copies when neither of the last 2 axes are contiguous
* Update addmm to support gemv shapes
* Update addmm to support irregular batch strides
* Update tests 
						
						
							
						
					 
					
						2024-03-12 13:13:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								nicolov 
							
						 
					 
					
						
						
							
						
						d0c544a868 
					 
					
						
						
							
							Add SVD primitive ( #809 )  
						
						 
						
						... 
						
						
						
						Add SVD op using Accelerate's LAPACK following
https://developer.apple.com/documentation/accelerate/ 
compressing_an_image_using_linear_algebra
Co-authored-by: Nicolo Valigi <nvaligi@apple.com > 
						
						
							
						
					 
					
						2024-03-12 12:30:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Falbel 
							
						 
					 
					
						
						
							
						
						ffb19df3c0 
					 
					
						
						
							
							Fix docstring for correctly rendering ( #820 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-12 11:46:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8b7532b9ab 
					 
					
						
						
							
							fix scatter ( #821 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-12 11:42:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						366478c560 
					 
					
						
						
							
							fix modules with dict ( #819 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-12 08:54:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Justin Deschenaux 
							
						 
					 
					
						
						
							
						
						8e5600022a 
					 
					
						
						
							
							Implement RNN, GRU, LSTM ( #268 )  
						
						 
						
						... 
						
						
						
						* RNN base implementation
* Address comments+format
* nits in docs
* add tests for prb
* fix test
* add a couple tests
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2024-03-11 21:14:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						0e95b64942 
					 
					
						
						
							
							Fix bug in tape order during simplify ( #816 )  
						
						 
						
						... 
						
						
						
						* fix bug in tape order during simplify
* properly fix compile
* last bug 
						
						
							
						
					 
					
						2024-03-11 17:29:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								nicolov 
							
						 
					 
					
						
						
							
						
						0ae22b915b 
					 
					
						
						
							
							Remove code duplication in reduce ops ( #793 )  
						
						 
						
						... 
						
						
						
						* Remove code duplication in reduce ops
* Remove the unnecessary lambda
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com > 
						
						
							
						
					 
					
						2024-03-11 10:57:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7c441600fe 
					 
					
						
						
							
							Compile stride bug ( #812 )  
						
						 
						
						... 
						
						
						
						* fix compile stride bug
* revert sdpa fix
* fix cpu
* fix bug with simplifying outputs 
						
						
							
						
					 
					
						2024-03-11 06:31:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a4d290adb9 
					 
					
						
						
							
							Remove depth traversal ( #813 )  
						
						 
						
						... 
						
						
						
						* no depth traversal
* counter outside loop 
						
						
							
						
					 
					
						2024-03-09 20:21:32 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						28301807c2 
					 
					
						
						
							
							Version bump and os error ( #807 )  
						
						 
						
						
						
						
							
  v0.6.0
 
						
					 
					
						2024-03-07 13:57:58 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						74ed0974b3 
					 
					
						
						
							
							Support 13.0+ with xcode 14.3 ( #806 )  
						
						 
						
						... 
						
						
						
						* Support 13.0+ with xcode 14.3
* revert revert 
						
						
							
						
					 
					
						2024-03-07 13:27:57 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						ec8a4864fa 
					 
					
						
						
							
							Fix SDPA kernel bug on Mac OS 13.3 SDK ( #805 )  
						
						 
						
						... 
						
						
						
						* Move sdpa kernel to allocate tgp mem statically and allow macOS 13.3 SDK builds
* Style 
						
						
							
						
					 
					
						2024-03-07 10:18:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b7588fd5d7 
					 
					
						
						
							
							fix inplace to not make a shallow copy ( #804 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-07 09:34:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f512b905c7 
					 
					
						
						
							
							Minimum xcode / sdk ( #800 )  
						
						 
						
						... 
						
						
						
						* minimum xcode /sdk
* try multiple xcode versions in CI
* update python
* metal validation for python tests 
						
						
							
						
					 
					
						2024-03-07 08:19:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						afd5274049 
					 
					
						
						
							
							route to fallback for bfloat ( #794 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-06 15:39:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1074674e32 
					 
					
						
						
							
							Add a maximum graph depth ( #797 )  
						
						 
						
						... 
						
						
						
						* add a maximum graph depth
* remember how to use C++ 
						
						
							
						
					 
					
						2024-03-06 15:39:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								AlexCheema 
							
						 
					 
					
						
						
							
						
						7762e07fde 
					 
					
						
						
							
							Update function_transforms.rst ( #796 )  
						
						 
						
						... 
						
						
						
						Fix typo in function_transforms.rst 
						
						
							
						
					 
					
						2024-03-06 12:03:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Luca Arnaboldi 
							
						 
					 
					
						
						
							
						
						cbefd9129e 
					 
					
						
						
							
							Implementation of pickle, copy and deepcopy for Python arrays ( #300  &  #367 ). ( #713 )  
						
						 
						
						... 
						
						
						
						* Implemented pickling and copy for Python arrays(#300  & #367 )
* Fixing typos
* Pickle with NumPy arrays
* Pickle: workaround for bfloat16
* Revert "Pickle: workaround for bfloat16"
This reverts commit 25afe6bc09 .
* Added an error when pickling bfloat16
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* clang-format applied
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
							
						
					 
					
						2024-03-06 08:02:41 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e39bebe13e 
					 
					
						
						
							
							Fix reshaping of empty arrays ( #791 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-05 23:33:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						14b4e51a7c 
					 
					
						
						
							
							Improved quantized matrix vector product ( #786 )  
						
						 
						
						
						
						
							
						
					 
					
						2024-03-05 17:32:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cbcf44a4ca 
					 
					
						
						
							
							Some fixes in cache / thread safety ( #777 )  
						
						 
						
						... 
						
						
						
						* some fixes in cache / thread safety
* speed up no cache case
* fix opt test
* optimizer docs
* otpimizer docs
* fix adafactor
* fix adafactor 
						
						
							
						
					 
					
						2024-03-05 13:30:50 -08:00