Awni Hannun 
							
						 
					 
					
						
						
							
						
						f8b6f8a3dc 
					 
					
						
						
							
							add test  
						
						
						
						
							
						
					 
					
						2025-10-16 07:41:22 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c473719b23 
					 
					
						
						
							
							fix compile when compiling multiple lambdas with the same capture  
						
						
						
						
							
						
					 
					
						2025-10-16 07:27:42 -07:00 
						 
				 
			
				
					
						
							
							
								Anastasiia Filippova 
							
						 
					 
					
						
						
							
						
						e9eab527eb 
					 
					
						
						
							
							Nccl timeout ( #2673 )  
						
						... 
						
						
						
						* print the error & delete nccl group
* timeout for nccl binding
* typo
* revert error
* fixed a typo 
						
						
							
						
					 
					
						2025-10-14 12:29:54 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						36ca62dba8 
					 
					
						
						
							
							remove unused unary file ( #2672 )  
						
						
						
						
							
						
					 
					
						2025-10-13 19:36:26 -07:00 
						 
				 
			
				
					
						
							
							
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						9cbb1b0148 
					 
					
						
						
							
							Modified sort behavior when running CPU or Metal to match NumPy/JAX ( #2667 )  
						
						... 
						
						
						
						* Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior.
* Modified sort behavior when running CPU or Metal to match NumPy/JAX
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-10-13 14:36:45 -07:00 
						 
				 
			
				
					
						
							
							
								Fabrizio Milo 
							
						 
					 
					
						
						
							
						
						9bfc476d72 
					 
					
						
						
							
							Normalize README bullet formatting ( #2671 )  
						
						
						
						
							
						
					 
					
						2025-10-13 12:13:30 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						25e2356316 
					 
					
						
						
							
							speed up scalars ( #2669 )  
						
						
						
						
							
						
					 
					
						2025-10-13 12:10:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						226a1d24e0 
					 
					
						
						
							
							Debug cuda conv ( #2662 )  
						
						... 
						
						
						
						* use t4
* use t4 
						
						
							
						
					 
					
						2025-10-10 16:12:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						630350ad3e 
					 
					
						
						
							
							Precise sigmoid ( #2659 )  
						
						... 
						
						
						
						* bump patch
* Sigmoid matches PyTorch and is more precise on tails 
						
						
							
						
					 
					
						2025-10-10 10:05:23 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						380aeb58ae 
					 
					
						
						
							
							enable admm low-precision cpu ( #2661 )  
						
						
						
						
							
						
					 
					
						2025-10-10 09:50:54 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f37389d100 
					 
					
						
						
							
							bump patch ( #2658 )  
						
						
						
						
							
						
					 
					
						2025-10-10 08:36:41 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e89e8b4272 
					 
					
						
						
							
							Export with callback ( #2612 )  
						
						... 
						
						
						
						* export with callback
* export with callback
* Add types, fix kwarg ordering bug + test
* cleanup, test, fix
* typos 
						
						
							
						
					 
					
						2025-10-08 19:24:33 -07:00 
						 
				 
			
				
					
						
							
							
								AN Long 
							
						 
					 
					
						
						
							
						
						85a8824a8c 
					 
					
						
						
							
							Fix cumulative operations when axis=None ( #2653 )  
						
						
						
						
							
						
					 
					
						2025-10-08 15:25:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f5d4397e5c 
					 
					
						
						
							
							Fix fast synch when fence is waited before a command buffer is created ( #2657 )  
						
						
						
						
							
						
					 
					
						2025-10-08 11:23:46 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						343e33b6d5 
					 
					
						
						
							
							fix all_gather vjp ( #2654 )  
						
						
						
						
							
						
					 
					
						2025-10-07 06:05:23 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0073096dd1 
					 
					
						
						
							
							Split name into directories for cuda jit ( #2656 )  
						
						
						
						
							
						
					 
					
						2025-10-07 01:52:58 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e3d004fed9 
					 
					
						
						
							
							Fix and refactor row-reduce ( #2650 )  
						
						
						
						
							
						
					 
					
						2025-10-07 01:51:08 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a393435d28 
					 
					
						
						
							
							Speed up compile for node with many parents ( #2649 )  
						
						
						
						
							
						
					 
					
						2025-10-03 19:30:36 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a7a94b29d7 
					 
					
						
						
							
							Fix compile when outputs change ( #2648 )  
						
						
						
						
							
						
					 
					
						2025-10-03 08:40:57 -07:00 
						 
				 
			
				
					
						
							
							
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						22a5da76c8 
					 
					
						
						
							
							Faster complex matmul ( #2571 )  
						
						
						
						
							
						
					 
					
						2025-10-02 23:33:15 -07:00 
						 
				 
			
				
					
						
							
							
								Andrey Portnoy 
							
						 
					 
					
						
						
							
						
						287c63a093 
					 
					
						
						
							
							Configure CMake to export compile_commands.json ( #2645 )  
						
						... 
						
						
						
						This helps enable LSP for code navigation using clangd. 
						
						
							
						
					 
					
						2025-10-02 15:40:32 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1c9ae1eaa1 
					 
					
						
						
							
							cuda fix flaky test ( #2646 )  
						
						
						
						
							
						
					 
					
						2025-10-02 15:40:04 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						c2c3e0b0a2 
					 
					
						
						
							
							[CUDA] Add a small column specialization to reduce ( #2642 )  
						
						
						
						
							
						
					 
					
						2025-10-02 14:41:05 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b0cc71ae71 
					 
					
						
						
							
							Faster triu, tril, where with scalar ( #2644 )  
						
						
						
						
							
						
					 
					
						2025-10-02 12:21:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e88f2d4a8e 
					 
					
						
						
							
							fix cross entropy axis param ( #2641 )  
						
						... 
						
						
						
						* fix cross entropy axis param
* faster grad clipping 
						
						
							
						
					 
					
						2025-10-01 16:49:55 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						9cee557423 
					 
					
						
						
							
							Fix status message ( #2638 )  
						
						
						
						
							
						
					 
					
						2025-10-01 16:43:45 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bbf1423953 
					 
					
						
						
							
							wait for tasks in cuda ( #2636 )  
						
						
						
						
							
						
					 
					
						2025-09-30 16:08:46 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						eb24267b56 
					 
					
						
						
							
							Compile now can attach arbitrary data to an entry ( #2634 )  
						
						
						
						
							
						
					 
					
						2025-09-30 13:33:27 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dc371ae7a5 
					 
					
						
						
							
							fix for max block dim ( #2631 )  
						
						
						
						
							
						
					 
					
						2025-09-29 08:59:25 -07:00 
						 
				 
			
				
					
						
							
							
								AN Long 
							
						 
					 
					
						
						
							
						
						e76a8dd5c5 
					 
					
						
						
							
							Fix incorrect path and typos ( #2630 )  
						
						
						
						
							
						
					 
					
						2025-09-28 06:03:04 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						b466dea982 
					 
					
						
						
							
							[CUDA] Make CudaEvent work with multi-device ( #2614 )  
						
						... 
						
						
						
						* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool 
						
						
							
						
					 
					
						2025-09-27 11:27:17 +09:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						7a6adda1e6 
					 
					
						
						
							
							Bump the version ( #2627 )  
						
						
						
						
							
 
						
					 
					
						2025-09-26 15:15:28 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1a9f820af6 
					 
					
						
						
							
							Compiled should not end in broadcast ( #2622 )  
						
						
						
						
							
						
					 
					
						2025-09-26 13:36:09 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d4f4ff3c5e 
					 
					
						
						
							
							Allow None input to compiled functions ( #2621 )  
						
						... 
						
						
						
						* Allow None input to compiled functions
* Allow None input to compiled functions 
						
						
							
						
					 
					
						2025-09-25 08:42:23 -07:00 
						 
				 
			
				
					
						
							
							
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						7c7e48dbd1 
					 
					
						
						
							
							New tuning for small K gemv ( #2620 )  
						
						... 
						
						
						
						* New tuning for small K gemv 
						
						
							
						
					 
					
						2025-09-23 12:28:35 -07:00 
						 
				 
			
				
					
						
							
							
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						fbbf3b9b3e 
					 
					
						
						
							
							Support pickling array for bfloat16 ( #2586 )  
						
						... 
						
						
						
						* add bfloat16 pickling
* Improvements
* improve
---------
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de > 
						
						
							
						
					 
					
						2025-09-22 20:12:15 -07:00 
						 
				 
			
				
					
						
							
							
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						bf01ad9367 
					 
					
						
						
							
							fix ( #2613 )  
						
						... 
						
						
						
						Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de > 
						
						
							
						
					 
					
						2025-09-22 20:12:04 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						ae438d05fa 
					 
					
						
						
							
							[CUDA] Recycle CUDA events ( #2604 )  
						
						... 
						
						
						
						* Make CudaEvent a CudaHandle
* Add caching for CudaEvent
* Make sure cuda events are destroyed at last
* Fix headers
* SharedEvent => AtomicEvent
* RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent
* Remove unneeded asserts 
						
						
							
						
					 
					
						2025-09-23 10:42:03 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						711a645807 
					 
					
						
						
							
							avoid producing NaN in attention ( #2608 )  
						
						
						
						
							
						
					 
					
						2025-09-22 13:10:43 -07:00 
						 
				 
			
				
					
						
							
							
								Josh Bleecher Snyder 
							
						 
					 
					
						
						
							
						
						aa9d44b3d4 
					 
					
						
						
							
							implement Convolution::output_shape ( #2601 )  
						
						... 
						
						
						
						- pull conv_out_shape out for re-use
- add Conv::output_shape
- add e2e python tests confirming shapeless=True support and correctness
Updates #2599  
						
						
							
						
					 
					
						2025-09-22 10:09:45 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec2ab42888 
					 
					
						
						
							
							Lower sorted QMM gather threshold ( #2609 )  
						
						
						
						
							
						
					 
					
						2025-09-19 18:22:55 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						787c0d90cd 
					 
					
						
						
							
							Detect cache thrashing in LRUCache ( #2600 )  
						
						... 
						
						
						
						* Detect cache thrashing in LRUCache
* Do not check cache thrashing in tests 
						
						
							
						
					 
					
						2025-09-19 09:12:14 +09:00 
						 
				 
			
				
					
						
							
							
								Oleksandr Bilous 
							
						 
					 
					
						
						
							
						
						e8b604a6a3 
					 
					
						
						
							
							fix: library loading for swift dynamic frameworks ( #2568 )  
						
						
						
						
							
						
					 
					
						2025-09-18 13:54:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						50cc09887f 
					 
					
						
						
							
							expose depends ( #2606 )  
						
						
						
						
							
						
					 
					
						2025-09-18 10:06:15 -07:00 
						 
				 
			
				
					
						
							
							
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						3f730e77aa 
					 
					
						
						
							
							Update export function example for array input ( #2598 )  
						
						... 
						
						
						
						After changing the shape to conform (same shapes for all objects), the example works. 
						
						
							
						
					 
					
						2025-09-16 14:38:05 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						caecbe876a 
					 
					
						
						
							
							no copy batch rope ( #2595 )  
						
						
						
						
							
						
					 
					
						2025-09-15 14:23:48 -07:00 
						 
				 
			
				
					
						
							
							
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						8afb6d62f2 
					 
					
						
						
							
							Fix typo in average_gradients function call ( #2594 )  
						
						
						
						
							
						
					 
					
						2025-09-15 11:29:21 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6ccfa603cd 
					 
					
						
						
							
							fix metal scan ( #2591 )  
						
						
						
						
							
						
					 
					
						2025-09-15 11:01:57 -07:00 
						 
				 
			
				
					
						
							
							
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						36cad99a11 
					 
					
						
						
							
							Refactor code examples to use 'gelu' ( #2592 )  
						
						... 
						
						
						
						Updated code examples to use 'gelu' directly instead of 'nn.gelu'. 
						
						
							
						
					 
					
						2025-09-15 09:47:02 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ee18e1cbf0 
					 
					
						
						
							
							patch bump ( #2588 )  
						
						
						
						
							
 
						
					 
					
						2025-09-11 17:10:09 -07:00