Awni Hannun 
							
						 
					 
					
						
						
							
						
						d378567cc6 
					 
					
						
						
							
							refactor for regular cuda malloc  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-31 14:12:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b84fc978d3 
					 
					
						
						
							
							add pool threshold  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-30 10:32:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						764b4b7ce8 
					 
					
						
						
							
							Use async cuda malloc managed with cuda 13  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-30 10:32:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Mike Drob 
							
						 
					 
					
						
						
							
						
						74c1ed25bb 
					 
					
						
						
							
							Migrate CircleCI to GitHub Actions ( #2716 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Joseph Heck <j_heck@apple.com > 
						
						
							
						
					 
					
						2025-10-30 12:26:55 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec72b44417 
					 
					
						
						
							
							Add quantize/dequantize for mxfp8 and nvfp4 ( #2688 )  
						
						 
						
						... 
						
						
						
						* Add quantize/dequantize slow path for mxfp8 and nvfp4
* fast cuda kernel for mx/nv quantization
* fallback for cuda < 12.8 (#2697 )
* format (#2700 )
* fix (#2701 )
* metal kernels
* docs
* fix jit
* add default bits and group sizes
* improve quant docs
* fix output type of mxfp4 matmuls 
						
						
							
						
					 
					
						2025-10-28 16:23:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Melissa Kilby 
							
						 
					 
					
						
						
							
						
						460691a0e8 
					 
					
						
						
							
							fix: linux-{fedora}x86_64-build ( #2707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Melissa Kilby <mkilby@apple.com > 
						
						
							
						
					 
					
						2025-10-27 16:36:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						969924cc69 
					 
					
						
						
							
							Fp8 conversion ( #2686 )  
						
						 
						
						... 
						
						
						
						* add fp8 e4m3 converters
* add cuda
* default saturate to min/max
* fix for older OS
* fix no gpu/cpu
* fix saturate
* fix compile 
						
						
							
						
					 
					
						2025-10-27 16:35:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d1e06117e8 
					 
					
						
						
							
							bump python ( #2694 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-27 11:34:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						539d8322d1 
					 
					
						
						
							
							add median op ( #2705 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-27 11:33:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c4767d110f 
					 
					
						
						
							
							fix addmm cpu ( #2699 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-27 11:33:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								David Koski 
							
						 
					 
					
						
						
							
						
						895217f25b 
					 
					
						
						
							
							optionally load metallib from framework ( #2702 )  
						
						 
						
						... 
						
						
						
						* optionally load metallib from framework
* pre-commit
* adjust logic 
						
						
							
						
					 
					
						2025-10-27 07:52:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						0cfeeb60ca 
					 
					
						
						
							
							Einsum error msg improvement ( #2690 )  
						
						 
						
						... 
						
						
						
						* Improved error message for Einsum
* Modifications via pre-commit
* format
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-10-27 06:31:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ronan Collobert 
							
						 
					 
					
						
						
							
						
						8f8af61a37 
					 
					
						
						
							
							fix warnings showing up with -Wall ( #2692 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-24 11:43:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						233384161e 
					 
					
						
						
							
							Improved mx.split() docs ( #2689 )  
						
						 
						
						... 
						
						
						
						* Improved mx.split() documentation
* Fix typo in docstring for array split function
* add example
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-10-24 09:48:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5bcf3a6794 
					 
					
						
						
							
							format  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-22 16:08:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wickedcoder 
							
						 
					 
					
						
						
							
						
						7707196297 
					 
					
						
						
							
							Merge commit from fork  
						
						 
						
						... 
						
						
						
						* add length validation to the header
* fix accessing out of bound index with .at() 
						
						
							
						
					 
					
						2025-10-22 15:31:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wickedcoder 
							
						 
					 
					
						
						
							
						
						7e3471c987 
					 
					
						
						
							
							Merge commit from fork  
						
						 
						
						... 
						
						
						
						* add tensor->weights_data validation
* add null pointer check for tensor 
						
						
							
						
					 
					
						2025-10-22 15:31:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9f0ba3ddf1 
					 
					
						
						
							
							patch bump ( #2680 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-17 12:12:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						4bce5f9b2d 
					 
					
						
						
							
							suppress gcc 10.1 warnings ( #2679 )  
						
						 
						
						... 
						
						
						
						* suppress gcc 10.1 warnings
* suppress gcc 10.1 warnings 
						
						
							
  v0.29.3
 
						
					 
					
						2025-10-17 12:09:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Anastasiia Filippova 
							
						 
					 
					
						
						
							
						
						e9eab527eb 
					 
					
						
						
							
							Nccl timeout ( #2673 )  
						
						 
						
						... 
						
						
						
						* print the error & delete nccl group
* timeout for nccl binding
* typo
* revert error
* fixed a typo 
						
						
							
						
					 
					
						2025-10-14 12:29:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						36ca62dba8 
					 
					
						
						
							
							remove unused unary file ( #2672 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-13 19:36:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						9cbb1b0148 
					 
					
						
						
							
							Modified sort behavior when running CPU or Metal to match NumPy/JAX ( #2667 )  
						
						 
						
						... 
						
						
						
						* Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior.
* Modified sort behavior when running CPU or Metal to match NumPy/JAX
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-10-13 14:36:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Fabrizio Milo 
							
						 
					 
					
						
						
							
						
						9bfc476d72 
					 
					
						
						
							
							Normalize README bullet formatting ( #2671 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-13 12:13:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						25e2356316 
					 
					
						
						
							
							speed up scalars ( #2669 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-13 12:10:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						226a1d24e0 
					 
					
						
						
							
							Debug cuda conv ( #2662 )  
						
						 
						
						... 
						
						
						
						* use t4
* use t4 
						
						
							
						
					 
					
						2025-10-10 16:12:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						630350ad3e 
					 
					
						
						
							
							Precise sigmoid ( #2659 )  
						
						 
						
						... 
						
						
						
						* bump patch
* Sigmoid matches PyTorch and is more precise on tails 
						
						
							
						
					 
					
						2025-10-10 10:05:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						380aeb58ae 
					 
					
						
						
							
							enable admm low-precision cpu ( #2661 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-10 09:50:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f37389d100 
					 
					
						
						
							
							bump patch ( #2658 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-10 08:36:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e89e8b4272 
					 
					
						
						
							
							Export with callback ( #2612 )  
						
						 
						
						... 
						
						
						
						* export with callback
* export with callback
* Add types, fix kwarg ordering bug + test
* cleanup, test, fix
* typos 
						
						
							
						
					 
					
						2025-10-08 19:24:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								AN Long 
							
						 
					 
					
						
						
							
						
						85a8824a8c 
					 
					
						
						
							
							Fix cumulative operations when axis=None ( #2653 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-08 15:25:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f5d4397e5c 
					 
					
						
						
							
							Fix fast synch when fence is waited before a command buffer is created ( #2657 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-08 11:23:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						343e33b6d5 
					 
					
						
						
							
							fix all_gather vjp ( #2654 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-07 06:05:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0073096dd1 
					 
					
						
						
							
							Split name into directories for cuda jit ( #2656 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-07 01:52:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e3d004fed9 
					 
					
						
						
							
							Fix and refactor row-reduce ( #2650 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-07 01:51:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a393435d28 
					 
					
						
						
							
							Speed up compile for node with many parents ( #2649 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-03 19:30:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a7a94b29d7 
					 
					
						
						
							
							Fix compile when outputs change ( #2648 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-03 08:40:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						22a5da76c8 
					 
					
						
						
							
							Faster complex matmul ( #2571 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-02 23:33:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andrey Portnoy 
							
						 
					 
					
						
						
							
						
						287c63a093 
					 
					
						
						
							
							Configure CMake to export compile_commands.json ( #2645 )  
						
						 
						
						... 
						
						
						
						This helps enable LSP for code navigation using clangd. 
						
						
							
						
					 
					
						2025-10-02 15:40:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						1c9ae1eaa1 
					 
					
						
						
							
							cuda fix flaky test ( #2646 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-02 15:40:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						c2c3e0b0a2 
					 
					
						
						
							
							[CUDA] Add a small column specialization to reduce ( #2642 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-02 14:41:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b0cc71ae71 
					 
					
						
						
							
							Faster triu, tril, where with scalar ( #2644 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-02 12:21:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e88f2d4a8e 
					 
					
						
						
							
							fix cross entropy axis param ( #2641 )  
						
						 
						
						... 
						
						
						
						* fix cross entropy axis param
* faster grad clipping 
						
						
							
						
					 
					
						2025-10-01 16:49:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						9cee557423 
					 
					
						
						
							
							Fix status message ( #2638 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-01 16:43:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bbf1423953 
					 
					
						
						
							
							wait for tasks in cuda ( #2636 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-30 16:08:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						eb24267b56 
					 
					
						
						
							
							Compile now can attach arbitrary data to an entry ( #2634 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-30 13:33:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dc371ae7a5 
					 
					
						
						
							
							fix for max block dim ( #2631 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-29 08:59:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								AN Long 
							
						 
					 
					
						
						
							
						
						e76a8dd5c5 
					 
					
						
						
							
							Fix incorrect path and typos ( #2630 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-28 06:03:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						b466dea982 
					 
					
						
						
							
							[CUDA] Make CudaEvent work with multi-device ( #2614 )  
						
						 
						
						... 
						
						
						
						* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool 
						
						
							
						
					 
					
						2025-09-27 11:27:17 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						7a6adda1e6 
					 
					
						
						
							
							Bump the version ( #2627 )  
						
						 
						
						
						
						
							
  v0.29.2
 
						
					 
					
						2025-09-26 15:15:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1a9f820af6 
					 
					
						
						
							
							Compiled should not end in broadcast ( #2622 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-26 13:36:09 -07:00