Awni Hannun 
							
						 
					 
					
						
						
							
						
						e88f2d4a8e 
					 
					
						
						
							
							fix cross entropy axis param ( #2641 )  
						
						 
						
						... 
						
						
						
						* fix cross entropy axis param
* faster grad clipping 
						
						
							
						
					 
					
						2025-10-01 16:49:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						9cee557423 
					 
					
						
						
							
							Fix status message ( #2638 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-10-01 16:43:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						bbf1423953 
					 
					
						
						
							
							wait for tasks in cuda ( #2636 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-30 16:08:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						eb24267b56 
					 
					
						
						
							
							Compile now can attach arbitrary data to an entry ( #2634 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-30 13:33:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						dc371ae7a5 
					 
					
						
						
							
							fix for max block dim ( #2631 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-29 08:59:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								AN Long 
							
						 
					 
					
						
						
							
						
						e76a8dd5c5 
					 
					
						
						
							
							Fix incorrect path and typos ( #2630 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-28 06:03:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						b466dea982 
					 
					
						
						
							
							[CUDA] Make CudaEvent work with multi-device ( #2614 )  
						
						 
						
						... 
						
						
						
						* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool 
						
						
							
						
					 
					
						2025-09-27 11:27:17 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						7a6adda1e6 
					 
					
						
						
							
							Bump the version ( #2627 )  
						
						 
						
						
						
						
							
  v0.29.2
 
						
					 
					
						2025-09-26 15:15:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1a9f820af6 
					 
					
						
						
							
							Compiled should not end in broadcast ( #2622 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-26 13:36:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d4f4ff3c5e 
					 
					
						
						
							
							Allow None input to compiled functions ( #2621 )  
						
						 
						
						... 
						
						
						
						* Allow None input to compiled functions
* Allow None input to compiled functions 
						
						
							
						
					 
					
						2025-09-25 08:42:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jagrit Digani 
							
						 
					 
					
						
						
							
						
						7c7e48dbd1 
					 
					
						
						
							
							New tuning for small K gemv ( #2620 )  
						
						 
						
						... 
						
						
						
						* New tuning for small K gemv 
						
						
							
						
					 
					
						2025-09-23 12:28:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						fbbf3b9b3e 
					 
					
						
						
							
							Support pickling array for bfloat16 ( #2586 )  
						
						 
						
						... 
						
						
						
						* add bfloat16 pickling
* Improvements
* improve
---------
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de > 
						
						
							
						
					 
					
						2025-09-22 20:12:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Yeh 
							
						 
					 
					
						
						
							
						
						bf01ad9367 
					 
					
						
						
							
							fix ( #2613 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de > 
						
						
							
						
					 
					
						2025-09-22 20:12:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						ae438d05fa 
					 
					
						
						
							
							[CUDA] Recycle CUDA events ( #2604 )  
						
						 
						
						... 
						
						
						
						* Make CudaEvent a CudaHandle
* Add caching for CudaEvent
* Make sure cuda events are destroyed at last
* Fix headers
* SharedEvent => AtomicEvent
* RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent
* Remove unneeded asserts 
						
						
							
						
					 
					
						2025-09-23 10:42:03 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						711a645807 
					 
					
						
						
							
							avoid producing NaN in attention ( #2608 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-22 13:10:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Josh Bleecher Snyder 
							
						 
					 
					
						
						
							
						
						aa9d44b3d4 
					 
					
						
						
							
							implement Convolution::output_shape ( #2601 )  
						
						 
						
						... 
						
						
						
						- pull conv_out_shape out for re-use
- add Conv::output_shape
- add e2e python tests confirming shapeless=True support and correctness
Updates #2599  
						
						
							
						
					 
					
						2025-09-22 10:09:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ec2ab42888 
					 
					
						
						
							
							Lower sorted QMM gather threshold ( #2609 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-19 18:22:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						787c0d90cd 
					 
					
						
						
							
							Detect cache thrashing in LRUCache ( #2600 )  
						
						 
						
						... 
						
						
						
						* Detect cache thrashing in LRUCache
* Do not check cache thrashing in tests 
						
						
							
						
					 
					
						2025-09-19 09:12:14 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Oleksandr Bilous 
							
						 
					 
					
						
						
							
						
						e8b604a6a3 
					 
					
						
						
							
							fix: library loading for swift dynamic frameworks ( #2568 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-18 13:54:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						50cc09887f 
					 
					
						
						
							
							expose depends ( #2606 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-18 10:06:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						3f730e77aa 
					 
					
						
						
							
							Update export function example for array input ( #2598 )  
						
						 
						
						... 
						
						
						
						After changing the shape to conform (same shapes for all objects), the example works. 
						
						
							
						
					 
					
						2025-09-16 14:38:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						caecbe876a 
					 
					
						
						
							
							no copy batch rope ( #2595 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-15 14:23:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						8afb6d62f2 
					 
					
						
						
							
							Fix typo in average_gradients function call ( #2594 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-15 11:29:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						6ccfa603cd 
					 
					
						
						
							
							fix metal scan ( #2591 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-15 11:01:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Umberto Mignozzetti 
							
						 
					 
					
						
						
							
						
						36cad99a11 
					 
					
						
						
							
							Refactor code examples to use 'gelu' ( #2592 )  
						
						 
						
						... 
						
						
						
						Updated code examples to use 'gelu' directly instead of 'nn.gelu'. 
						
						
							
						
					 
					
						2025-09-15 09:47:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						ee18e1cbf0 
					 
					
						
						
							
							patch bump ( #2588 )  
						
						 
						
						
						
						
							
  v0.29.1
 
						
					 
					
						2025-09-11 17:10:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						af120c2bc0 
					 
					
						
						
							
							set nccl ABI version ( #2587 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-11 16:55:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						6a3acf2301 
					 
					
						
						
							
							[CUDA] Set bias as input when using bias epilogue ( #2584 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-11 15:31:09 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d6977f2a57 
					 
					
						
						
							
							Add sdpa with sinks ( #2558 )  
						
						 
						
						... 
						
						
						
						* add sdpa with sinks
* fix 2 pass
* fix matrix sdpa
* fix perf regression
* add to cuda (#2580 ) 
						
						
							
						
					 
					
						2025-09-10 14:53:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Gökdeniz Gülmez 
							
						 
					 
					
						
						
							
						
						db5443e831 
					 
					
						
						
							
							Adding Relu2 ( #2582 )  
						
						 
						
						... 
						
						
						
						* in. com.
* upd. ackn.
* update __init__
* nits
* nits + format
* used mx.maximum(x, 0) instead of calling the function and moves relu6 under relu2 to make it nicer
* same with _make_activation_module
* Update python/mlx/nn/layers/activations.py
upd
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* update funct.rst
* upd. layers.rst
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com > 
						
						
							
						
					 
					
						2025-09-10 07:24:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						52b8384d10 
					 
					
						
						
							
							Fix flaky addmm tests ( #2581 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-10 14:22:22 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						44cc5da4bc 
					 
					
						
						
							
							[CUDA] Fix alpha not respected when using bias epilogue ( #2578 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-10 09:08:01 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						dde3682b69 
					 
					
						
						
							
							[CUDA] Use GEMM with epilogue instead of AddMM ( #2569 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-09 13:18:49 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						17310d91a6 
					 
					
						
						
							
							Add batch offsets for mx.fast.rope ( #2564 )  
						
						 
						
						... 
						
						
						
						* implement batch rope for Metal
* cuda rope (#2576 ) 
						
						
							
						
					 
					
						2025-09-08 17:35:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						b194d65a6a 
					 
					
						
						
							
							Some tweaks in cmake files ( #2574 )  
						
						 
						
						... 
						
						
						
						* Do proper check of Metal lib
* Update doctest to get rid of cmake version hack 
						
						
							
						
					 
					
						2025-09-09 08:27:18 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						a44b27f5f8 
					 
					
						
						
							
							Fix a few ccache cache miss ( #2573 )  
						
						 
						
						... 
						
						
						
						* Fix ccache cache miss
* Do not define _VERSION_ in python bindings 
						
						
							
						
					 
					
						2025-09-09 07:41:05 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e5a33f2223 
					 
					
						
						
							
							faster depthwise 1D conv ( #2567 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-08 11:37:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cheng 
							
						 
					 
					
						
						
							
						
						c1e3340b23 
					 
					
						
						
							
							Set ccache size before building ( #2570 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-07 09:00:31 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								XXXXRT666 
							
						 
					 
					
						
						
							
						
						8f163a367d 
					 
					
						
						
							
							typing: add type hints to mlx.core.array, linalg, distributed, and random ( #2565 )  
						
						 
						
						... 
						
						
						
						* Add type annotations to mlx methods
* Missing list_or_scalar 
						
						
							
						
					 
					
						2025-09-04 09:08:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						89a3df9014 
					 
					
						
						
							
							Fixed several type annotations in the MLX stubs which degraded to Unknown/Any ( #2560 )  
						
						 
						
						... 
						
						
						
						* Added scalar to stubs to fix Unkown Type Hint
### Proposed changes
Issue #2478  reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias.
This PR updates the stub generation patterns to:
	•	Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available.
	•	Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any.
	•	Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules.
With these changes, functions like mlx.add now display rich type signatures such as:
```
def add(
    a: scalar | array,
    b: scalar | array,
    stream: Stream | Device | None = None
) -> array
```
instead of degrading to Any.
### Checklist
	•	I have read the CONTRIBUTING document
	•	I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
	•	I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only)
	•	I have updated the necessary documentation (if needed)
* add bool to patterns
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-09-03 12:52:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Krishi Saripalli 
							
						 
					 
					
						
						
							
						
						c5d2937aa5 
					 
					
						
						
							
							chore: Update Docs With Slice Copy Example ( #2559 )  
						
						 
						
						... 
						
						
						
						* chore: updated docs with slice copy example
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-09-02 22:07:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b61a65e313 
					 
					
						
						
							
							fix copies in sdpa ( #2563 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-02 11:00:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wrmsr 
							
						 
					 
					
						
						
							
						
						04cbb4191c 
					 
					
						
						
							
							Fix dequantize python sig ( #2562 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-09-01 11:50:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Artur Antonov 
							
						 
					 
					
						
						
							
						
						c5460762e7 
					 
					
						
						
							
							Fix AdamW weight_decay default value in docstring ( #2557 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-08-31 21:29:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8ce49cd39e 
					 
					
						
						
							
							fix quantized vjp for mxfp4 ( #2555 )  
						
						 
						
						
						
						
							
  v0.29.0
 
						
					 
					
						2025-08-29 10:06:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9c68b50853 
					 
					
						
						
							
							version bump ( #2554 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-08-29 06:54:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						111f1e71af 
					 
					
						
						
							
							Faster contiguous gather for indices in the first axis ( #2552 )  
						
						 
						
						... 
						
						
						
						* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases 
						
						
							
						
					 
					
						2025-08-28 21:26:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						827003d568 
					 
					
						
						
							
							fix METAL quantization in JIT ( #2553 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-08-28 18:26:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d363a76aa4 
					 
					
						
						
							
							Bump xcode in circle ( #2551 )  
						
						 
						
						... 
						
						
						
						* bump xcode in circle
* bump xcode in circle
* bump xcode in circle 
						
						
							
						
					 
					
						2025-08-28 13:13:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						70560b6bd5 
					 
					
						
						
							
							Add mode parameter for quantization ( #2499 )  
						
						 
						
						... 
						
						
						
						* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum 
						
						
							
						
					 
					
						2025-08-28 06:45:26 -07:00