toji 
							
						 
					 
					
						
						
							
						
						6768c6a54a 
					 
					
						
						
							
							Adding missing type hints  ( #1243 )  
						
						... 
						
						
						
						* added type hints for `run`, `tree_map` and `tree_map_with_path`
* fix lint
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
					 
					
						2024-07-23 07:29:38 -07:00 
						 
				 
			
				
					
						
							
							
								Tim Gymnich 
							
						 
					 
					
						
						
							
						
						6307d166eb 
					 
					
						
						
							
							Fix overflow / underflow handling for expm1f ( #1278 )  
						
						... 
						
						
						
						* Fix overflow / underflow handling for expm1f
* update tests 
						
						
					 
					
						2024-07-23 07:29:06 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1fba87b0df 
					 
					
						
						
							
							Fix leak with multi-output primitives ( #1274 )  
						
						... 
						
						
						
						* fix leak with multi-output primitives
* hopefully an actual fix 
						
						
					 
					
						2024-07-23 06:34:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8c01a7893b 
					 
					
						
						
							
							minor fix in optimizer + docs ( #1264 )  
						
						
						
						
					 
					
						2024-07-12 12:18:02 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						218047c75a 
					 
					
						
						
							
							docs fixes ( #1263 )  
						
						
						
						
					 
					
						2024-07-11 15:59:07 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						5c1fa64fb0 
					 
					
						
						
							
							Custom transforms ( #1246 )  
						
						
						
						
					 
					
						2024-07-10 18:00:01 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						a3c287354f 
					 
					
						
						
							
							Fast Hadamard Transform ( #1249 )  
						
						... 
						
						
						
						* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent 
						
						
					 
					
						2024-07-09 20:39:01 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						bdb36c9a63 
					 
					
						
						
							
							add zero vjps for bitwise ops and gather w.r.t. index ( #1256 )  
						
						
						
						
					 
					
						2024-07-07 21:34:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						20bb301195 
					 
					
						
						
							
							CPU binary reduction + Nits ( #1242 )  
						
						... 
						
						
						
						* very minor nits
* reduce binary
* fix test 
						
						
					 
					
						2024-06-28 13:50:42 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						b05bcfd27f 
					 
					
						
						
							
							Fixes segfault when compiling checkpointed functions ( #1235 )  
						
						
						
						
					 
					
						2024-06-26 16:14:45 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						2615660e62 
					 
					
						
						
							
							Fix strided sort bug ( #1236 )  
						
						... 
						
						
						
						* Use output strides in sort kernel
* fix zero strides bug 
						
						
					 
					
						2024-06-26 14:32:11 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5b0af4cdb1 
					 
					
						
						
							
							fix donation condition for compilation ( #1237 )  
						
						
						
						
					 
					
						2024-06-26 09:04:05 -07:00 
						 
				 
			
				
					
						
							
							
								David Koski 
							
						 
					 
					
						
						
							
						
						4eef1e8a3e 
					 
					
						
						
							
							fix typo ( #1215 )  
						
						
						
						
					 
					
						2024-06-24 13:36:35 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						95d11bda06 
					 
					
						
						
							
							Fix NumPy 2.0 pickle test ( #1221 )  
						
						... 
						
						
						
						* fix numpy version <2 temporarily
* typo
* better fix
* Fix just for bfloat16
---------
Co-authored-by: Alex Barron <abarron22@apple.com > 
						
						
					 
					
						2024-06-23 05:47:22 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						2d6cd47713 
					 
					
						
						
							
							Masked gemv ( #1211 )  
						
						
						
						
					 
					
						2024-06-14 09:52:26 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						df964132fb 
					 
					
						
						
							
							fix scatter + test ( #1202 )  
						
						... 
						
						
						
						* fix scatter + test
* fix test warnings
* fix metal validation 
						
						
					 
					
						2024-06-11 14:35:12 -07:00 
						 
				 
			
				
					
						
							
							
								Alex Barron 
							
						 
					 
					
						
						
							
						
						27d70c7d9d 
					 
					
						
						
							
							Feature complete Metal FFT ( #1102 )  
						
						... 
						
						
						
						* feature complete metal fft
* fix contiguity bug
* jit fft
* simplify rader/bluestein constant computation
* remove kernel/utils.h dep
* remove bf16.h dep
* format
---------
Co-authored-by: Alex Barron <abarron22@apple.com > 
						
						
					 
					
						2024-06-06 12:57:25 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0163a8e57a 
					 
					
						
						
							
							Add docs for the distributed namespace ( #1184 )  
						
						
						
						
					 
					
						2024-06-06 11:37:00 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						496315fe1d 
					 
					
						
						
							
							Fix scan ( #1188 )  
						
						... 
						
						
						
						* fix scan
* improve grid size
* fix cpu cummax 
						
						
					 
					
						2024-06-05 14:21:58 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0fe6895893 
					 
					
						
						
							
							Fix the hard-shrink test ( #1185 )  
						
						
						
						
					 
					
						2024-06-04 16:22:56 -07:00 
						 
				 
			
				
					
						
							
							
								Nikhil Mehta 
							
						 
					 
					
						
						
							
						
						0b7d71fd2f 
					 
					
						
						
							
							Add softmin, hardshrink, hardtanh ( #1180 )  
						
						... 
						
						
						
						---------
Co-authored-by: Nikhil Mehta <nikmehta@tesla.com > 
						
						
					 
					
						2024-06-04 15:48:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						83b11bc58d 
					 
					
						
						
							
							Fix Metal API validation for empty concat ( #1183 )  
						
						
						
						
					 
					
						2024-06-04 13:17:08 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ea9090bbc4 
					 
					
						
						
							
							Add view op ( #1179 )  
						
						... 
						
						
						
						* add view primitive
* nit
* fix view 
						
						
					 
					
						2024-06-04 08:05:27 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						3de8ce3f3c 
					 
					
						
						
							
							In place all-reduce and forgiving init ( #1178 )  
						
						
						
						
					 
					
						2024-06-03 16:47:47 -07:00 
						 
				 
			
				
					
						
							
							
								Brian Keene 
							
						 
					 
					
						
						
							
						
						1865299a30 
					 
					
						
						
							
							Metal shaders for memory efficient self attention on large sequences ( #964 )  
						
						... 
						
						
						
						* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI 
						
						
					 
					
						2024-06-03 09:16:19 -07:00 
						 
				 
			
				
					
						
							
							
								Dominik Schlösser 
							
						 
					 
					
						
						
							
						
						3576b547c5 
					 
					
						
						
							
							Doc error for default for scale in SinusoidalPositionalEncoding ( #1174 )  
						
						
						
						
					 
					
						2024-06-02 13:42:45 -07:00 
						 
				 
			
				
					
						
							
							
								K Venkat Ramnan 
							
						 
					 
					
						
						
							
						
						ab977109db 
					 
					
						
						
							
							feat: Added dlpack device ( #1165 )  
						
						... 
						
						
						
						* feat: Added dlpack device
* feat: Added device_id to dlpack device
* feat: Added device_id to dlpack device
* doc: updated conversion docs
* doc: updated numpy.rst dlpack information
* doc: updated numpy.rst dlpack information
* Update docs/src/usage/numpy.rst
* Update docs/src/usage/numpy.rst
---------
Co-authored-by: Venkat Ramnan Kalyanakumar <venkatramnankalyanakumar@Venkats-MacBook-Air.local >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
					 
					
						2024-05-31 12:29:01 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fd1c08137b 
					 
					
						
						
							
							stable cumprod grad at 0 ( #1167 )  
						
						
						
						
					 
					
						2024-05-31 12:28:42 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						76b6cece46 
					 
					
						
						
							
							Fix multi-block sort stride management ( #1169 )  
						
						... 
						
						
						
						* Fix multi-block sort stride management
* Add seed to tests 
						
						
					 
					
						2024-05-31 11:10:54 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						9f0df51f8d 
					 
					
						
						
							
							Fix matvec vector stride bug ( #1168 )  
						
						
						
						
					 
					
						2024-05-29 12:18:28 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7a2a3dcd1 
					 
					
						
						
							
							Fix a couple bugs ( #1161 )  
						
						... 
						
						
						
						* fix jit reduce for RMS norm
* make strides a single buffer
* better eval error message
* fix compiling with inf and bf16
* fix cpu compile with bf16 
						
						
					 
					
						2024-05-28 15:18:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a87ef5bfc1 
					 
					
						
						
							
							fix broadcast bug in bitwise ops ( #1157 )  
						
						
						
						
					 
					
						2024-05-24 11:44:40 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7e26fd8032 
					 
					
						
						
							
							Option to JIT steel gemm / conv ( #1139 )  
						
						
						
						
					 
					
						2024-05-23 18:07:34 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						eab2685c67 
					 
					
						
						
							
							Float mask update ( #1152 )  
						
						... 
						
						
						
						* Float mask update
* Update CPU impl 
						
						
					 
					
						2024-05-23 17:20:44 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						50dfb664db 
					 
					
						
						
							
							Comms ( #1097 )  
						
						... 
						
						
						
						* Start the communications branch using MPI
* Add ops and primitives
* Add python bindings for distributed 
						
						
					 
					
						2024-05-23 17:04:02 -07:00 
						 
				 
			
				
					
						
							
							
								Rifur13 
							
						 
					 
					
						
						
							
						
						9401507336 
					 
					
						
						
							
							Add groups to 2-D convolutions ( #1129 )  
						
						... 
						
						
						
						* Added groups to 2-D convolutions. Only implemented for **some** specializations.
Also fixed 1D grouped convs with different kernel strides and added more tests.
* fix channels condition 
						
						
					 
					
						2024-05-22 20:01:44 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						eb8321d863 
					 
					
						
						
							
							list based indexing ( #1150 )  
						
						
						
						
					 
					
						2024-05-22 15:52:05 -07:00 
						 
				 
			
				
					
						
							
							
								Abe Leininger 
							
						 
					 
					
						
						
							
						
						79ef49b2c2 
					 
					
						
						
							
							add mx.trace ( #1143 ) ( #1147 )  
						
						... 
						
						
						
						* working c++ trace implementation
* updated throw + added overloads
* added python binding for trace function
* pre-commit reformatting
* add trace to docs
* resolve comments
* remove to_stream call 
						
						
					 
					
						2024-05-22 15:50:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d568c7ee36 
					 
					
						
						
							
							Rename block sparse ( #1149 )  
						
						... 
						
						
						
						* block_sparse_mm to gather_mm
* rename
* nit
* nit 
						
						
					 
					
						2024-05-22 07:48:34 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e6fecbb3e1 
					 
					
						
						
							
							Some fixes in docs ( #1141 )  
						
						... 
						
						
						
						* fixes in docs
* nit 
						
						
					 
					
						2024-05-20 11:51:47 -07:00 
						 
				 
			
				
					
						
							
							
								jlwitthuhn 
							
						 
					 
					
						
						
							
						
						7e5674d8be 
					 
					
						
						
							
							Treate 'minimum' differently in cosine decay ( #1138 )  
						
						
						
						
					 
					
						2024-05-20 08:00:48 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						fb71a82ada 
					 
					
						
						
							
							Fix copy bug with many dims ( #1137 )  
						
						
						
						
					 
					
						2024-05-17 21:10:03 -07:00 
						 
				 
			
				
					
						
							
							
								Luca Arnaboldi 
							
						 
					 
					
						
						
							
						
						b3ec792380 
					 
					
						
						
							
							Implemented Cholesky on CPU ( #1119 )  
						
						
						
						
					 
					
						2024-05-17 12:31:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						81dd33af66 
					 
					
						
						
							
							allow conversion to dlpack ( #1120 )  
						
						
						
						
					 
					
						2024-05-16 16:11:37 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e78a6518fa 
					 
					
						
						
							
							Block sparse qmm ( #1124 )  
						
						
						
						
					 
					
						2024-05-16 15:24:14 -07:00 
						 
				 
			
				
					
						
							
							
								Jacket 
							
						 
					 
					
						
						
							
						
						c417e42116 
					 
					
						
						
							
							[Fix] minor typo in default argument for argpartition's "axis" parameter ( #1125 )  
						
						... 
						
						
						
						According to the document, argpartition's axis parameter can be None, but due to a previous typo it can't really accepts a None value. 
						
						
					 
					
						2024-05-15 15:25:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						631dfbe673 
					 
					
						
						
							
							fix scatter index bug ( #1122 )  
						
						
						
						
					 
					
						2024-05-14 15:04:58 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						56a4eaed72 
					 
					
						
						
							
							Pass missing stream arg in array.flatten ( #1111 )  
						
						
						
						
					 
					
						2024-05-14 06:50:16 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						bf925d9dc7 
					 
					
						
						
							
							Move args in conv_general ( #1118 )  
						
						... 
						
						
						
						Also fix a typo that padding_lo is passed as padding_hi. 
						
						
					 
					
						2024-05-14 06:50:09 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						1a7ed5dcb6 
					 
					
						
						
							
							Fill vector with constructor instead of fill_n ( #1113 )  
						
						
						
						
					 
					
						2024-05-14 06:28:55 -07:00