Cheng 
							
						 
					 
					
						
						
							
						
						dde3682b69 
					 
					
						
						
							
							[CUDA] Use GEMM with epilogue instead of AddMM ( #2569 )  
						
						
						
						
							
						
					 
					
						2025-09-09 13:18:49 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						17310d91a6 
					 
					
						
						
							
							Add batch offsets for mx.fast.rope ( #2564 )  
						
						... 
						
						
						
						* implement batch rope for Metal
* cuda rope (#2576 ) 
						
						
							
						
					 
					
						2025-09-08 17:35:07 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						b194d65a6a 
					 
					
						
						
							
							Some tweaks in cmake files ( #2574 )  
						
						... 
						
						
						
						* Do proper check of Metal lib
* Update doctest to get rid of cmake version hack 
						
						
							
						
					 
					
						2025-09-09 08:27:18 +09:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						a44b27f5f8 
					 
					
						
						
							
							Fix a few ccache cache miss ( #2573 )  
						
						... 
						
						
						
						* Fix ccache cache miss
* Do not define _VERSION_ in python bindings 
						
						
							
						
					 
					
						2025-09-09 07:41:05 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e5a33f2223 
					 
					
						
						
							
							faster depthwise 1D conv ( #2567 )  
						
						
						
						
							
						
					 
					
						2025-09-08 11:37:23 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						c1e3340b23 
					 
					
						
						
							
							Set ccache size before building ( #2570 )  
						
						
						
						
							
						
					 
					
						2025-09-07 09:00:31 +09:00 
						 
				 
			
				
					
						
							
							
								XXXXRT666 
							
						 
					 
					
						
						
							
						
						8f163a367d 
					 
					
						
						
							
							typing: add type hints to mlx.core.array, linalg, distributed, and random ( #2565 )  
						
						... 
						
						
						
						* Add type annotations to mlx methods
* Missing list_or_scalar 
						
						
							
						
					 
					
						2025-09-04 09:08:11 -07:00 
						 
				 
			
				
					
						
							
							
								Manuel Villanueva 
							
						 
					 
					
						
						
							
						
						89a3df9014 
					 
					
						
						
							
							Fixed several type annotations in the MLX stubs which degraded to Unknown/Any ( #2560 )  
						
						... 
						
						
						
						* Added scalar to stubs to fix Unkown Type Hint
### Proposed changes
Issue #2478  reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias.
This PR updates the stub generation patterns to:
	•	Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available.
	•	Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any.
	•	Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules.
With these changes, functions like mlx.add now display rich type signatures such as:
```
def add(
    a: scalar | array,
    b: scalar | array,
    stream: Stream | Device | None = None
) -> array
```
instead of degrading to Any.
### Checklist
	•	I have read the CONTRIBUTING document
	•	I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
	•	I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only)
	•	I have updated the necessary documentation (if needed)
* add bool to patterns
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-09-03 12:52:08 -07:00 
						 
				 
			
				
					
						
							
							
								Krishi Saripalli 
							
						 
					 
					
						
						
							
						
						c5d2937aa5 
					 
					
						
						
							
							chore: Update Docs With Slice Copy Example ( #2559 )  
						
						... 
						
						
						
						* chore: updated docs with slice copy example
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com > 
						
						
							
						
					 
					
						2025-09-02 22:07:02 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						b61a65e313 
					 
					
						
						
							
							fix copies in sdpa ( #2563 )  
						
						
						
						
							
						
					 
					
						2025-09-02 11:00:36 -07:00 
						 
				 
			
				
					
						
							
							
								wrmsr 
							
						 
					 
					
						
						
							
						
						04cbb4191c 
					 
					
						
						
							
							Fix dequantize python sig ( #2562 )  
						
						
						
						
							
						
					 
					
						2025-09-01 11:50:20 -07:00 
						 
				 
			
				
					
						
							
							
								Artur Antonov 
							
						 
					 
					
						
						
							
						
						c5460762e7 
					 
					
						
						
							
							Fix AdamW weight_decay default value in docstring ( #2557 )  
						
						
						
						
							
						
					 
					
						2025-08-31 21:29:30 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						8ce49cd39e 
					 
					
						
						
							
							fix quantized vjp for mxfp4 ( #2555 )  
						
						
						
						
							
 
						
					 
					
						2025-08-29 10:06:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						9c68b50853 
					 
					
						
						
							
							version bump ( #2554 )  
						
						
						
						
							
						
					 
					
						2025-08-29 06:54:17 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						111f1e71af 
					 
					
						
						
							
							Faster contiguous gather for indices in the first axis ( #2552 )  
						
						... 
						
						
						
						* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases 
						
						
							
						
					 
					
						2025-08-28 21:26:30 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						827003d568 
					 
					
						
						
							
							fix METAL quantization in JIT ( #2553 )  
						
						
						
						
							
						
					 
					
						2025-08-28 18:26:25 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d363a76aa4 
					 
					
						
						
							
							Bump xcode in circle ( #2551 )  
						
						... 
						
						
						
						* bump xcode in circle
* bump xcode in circle
* bump xcode in circle 
						
						
							
						
					 
					
						2025-08-28 13:13:34 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						70560b6bd5 
					 
					
						
						
							
							Add mode parameter for quantization ( #2499 )  
						
						... 
						
						
						
						* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum 
						
						
							
						
					 
					
						2025-08-28 06:45:26 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						7ef8a6f2d5 
					 
					
						
						
							
							[CUDA] fix sort ( #2550 )  
						
						... 
						
						
						
						* [CUDA] fix sort
* fix test 
						
						
							
						
					 
					
						2025-08-27 19:48:43 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						31c6f6e33f 
					 
					
						
						
							
							[CUDA] Use ConcurrentContext in concatenate_gpu ( #2549 )  
						
						
						
						
							
						
					 
					
						2025-08-28 09:30:08 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						584d48458e 
					 
					
						
						
							
							link with nccl ( #2546 )  
						
						
						
						
							
						
					 
					
						2025-08-27 10:01:07 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						5cf984ca87 
					 
					
						
						
							
							Separate cpu compilation cache by versions ( #2548 )  
						
						
						
						
							
						
					 
					
						2025-08-27 11:25:15 +09:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						a9bac3d9e5 
					 
					
						
						
							
							Run CPP tests for CUDA build in CI ( #2544 )  
						
						
						
						
							
						
					 
					
						2025-08-27 08:06:46 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						5458d43247 
					 
					
						
						
							
							add load with path tests ( #2543 )  
						
						
						
						
							
						
					 
					
						2025-08-26 14:24:47 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						a4dba65220 
					 
					
						
						
							
							Enable cuda graph toggle ( #2545 )  
						
						... 
						
						
						
						* enable cuda graph toggle
* increase cache size 
						
						
							
						
					 
					
						2025-08-26 12:50:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						3dcb286baf 
					 
					
						
						
							
							Remove stream from average grads so it uses default ( #2532 )  
						
						... 
						
						
						
						* Remove stream from average grads so it uses default
* comment 
						
						
							
						
					 
					
						2025-08-25 15:56:29 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						4822c3dbe9 
					 
					
						
						
							
							[CUDA] Implement DynamicSlice/DynamicSliceUpdate ( #2533 )  
						
						... 
						
						
						
						* Move DynamicSlice to gpu/primitives
* Implement compute_dynamic_offset in CUDA 
						
						
							
						
					 
					
						2025-08-26 07:31:39 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						2ca75bb529 
					 
					
						
						
							
							Remove nccl install in release ( #2542 )  
						
						
						
						
							
						
					 
					
						2025-08-25 15:20:18 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						db14e29a0b 
					 
					
						
						
							
							allow pathlib.Path to save/load functions ( #2541 )  
						
						
						
						
							
						
					 
					
						2025-08-25 14:58:49 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						d2f540f4e0 
					 
					
						
						
							
							Use nccl header only when nccl is not present ( #2539 )  
						
						... 
						
						
						
						* use nccl header only when nccl is not present
* larger machine for cuda build 
						
						
							
						
					 
					
						2025-08-25 14:17:25 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						333ffea273 
					 
					
						
						
							
							[CUDA] Remove thrust in arange ( #2535 )  
						
						
						
						
							
						
					 
					
						2025-08-24 16:22:36 +09:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						f55b6f1f2f 
					 
					
						
						
							
							Enable COMPILE_WARNING_AS_ERROR for linux builds in CI ( #2534 )  
						
						
						
						
							
						
					 
					
						2025-08-24 15:33:08 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						30561229c7 
					 
					
						
						
							
							Fix allocation bug in NCCL ( #2530 )  
						
						
						
						
							
						
					 
					
						2025-08-22 14:39:43 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						068a4612e9 
					 
					
						
						
							
							nccl default for backend=any ( #2528 )  
						
						... 
						
						
						
						* nccl default for backend=any
* check num gpus + ensure row contiguous for all reduce
* comment 
						
						
							
						
					 
					
						2025-08-22 12:24:27 -07:00 
						 
				 
			
				
					
						
							
							
								Andrey Portnoy 
							
						 
					 
					
						
						
							
						
						5722c147de 
					 
					
						
						
							
							[CUDA] Update calls to cudaMemAdvise and cudaGraphAddDependencies for CUDA 13  ( #2525 )  
						
						... 
						
						
						
						* [CUDA] Update cudaMemAdvise and cudaGraphAddDependencies for CUDA 13
These functions' signatures changed in CUDA 13, so we differentiate
between CUDA 13 and preceding releases at compile time.
* Mention NVIDIA in ACKNOWLEDGMENTS.md 
						
						
							
						
					 
					
						2025-08-21 19:57:20 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						f6819a1f26 
					 
					
						
						
							
							Fix warning 186-D from nvcc ( #2527 )  
						
						
						
						
							
						
					 
					
						2025-08-22 10:29:55 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						f93f87c802 
					 
					
						
						
							
							nccl dep + default for cuda ( #2526 )  
						
						
						
						
							
						
					 
					
						2025-08-21 17:57:49 -07:00 
						 
				 
			
				
					
						
							
							
								Anastasiia Filippova 
							
						 
					 
					
						
						
							
						
						9392fc3f88 
					 
					
						
						
							
							NCCL backend ( #2476 )  
						
						
						
						
							
						
					 
					
						2025-08-21 11:56:15 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e843c4d8d5 
					 
					
						
						
							
							fix power ( #2523 )  
						
						
						
						
							
						
					 
					
						2025-08-21 06:46:01 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						0c5fc63a36 
					 
					
						
						
							
							Fix docs omission ( #2524 )  
						
						
						
						
							
						
					 
					
						2025-08-20 17:56:06 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						e397177f6e 
					 
					
						
						
							
							Custom cuda kernel ( #2517 )  
						
						
						
						
							
						
					 
					
						2025-08-20 17:20:22 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						f4c8888cbe 
					 
					
						
						
							
							[CUDA] Fix stride of singleton dims before passing to cuDNN ( #2521 )  
						
						
						
						
							
						
					 
					
						2025-08-21 08:55:26 +09:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						25c1e03205 
					 
					
						
						
							
							Fix overflow in large filter small channels ( #2520 )  
						
						
						
						
							
						
					 
					
						2025-08-20 08:03:29 -07:00 
						 
				 
			
				
					
						
							
							
								russellizadi 
							
						 
					 
					
						
						
							
						
						512281781c 
					 
					
						
						
							
							Remove state return from function example in compile documentation ( #2518 )  
						
						
						
						
							
						
					 
					
						2025-08-20 00:45:05 -07:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						ac85ddfdb7 
					 
					
						
						
							
							[CUDA] Add GEMM-based fallback convolution kernels ( #2511 )  
						
						... 
						
						
						
						* Add gemm_conv
* Add gemm_grouped_conv 
						
						
							
						
					 
					
						2025-08-20 10:06:22 +09:00 
						 
				 
			
				
					
						
							
							
								Cheng 
							
						 
					 
					
						
						
							
						
						65d0d40232 
					 
					
						
						
							
							Split cuDNN helpers into a separate header ( #2491 )  
						
						... 
						
						
						
						* Add RAII managed CudaGraph class
* Implement forward rms_norm with cuDNN
* Revert back to old rms norm kernel 
						
						
							
						
					 
					
						2025-08-20 09:29:28 +09:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						cea9369610 
					 
					
						
						
							
							fix lapack svd ( #2515 )  
						
						
						
						
							
						
					 
					
						2025-08-18 15:07:59 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						e7c6e1db82 
					 
					
						
						
							
							no segfault with uninitialized array.at ( #2514 )  
						
						
						
						
							
						
					 
					
						2025-08-18 08:33:38 -07:00 
						 
				 
			
				
					
						
							
							
								Awni Hannun 
							
						 
					 
					
						
						
							
						
						c5fcd5b61b 
					 
					
						
						
							
							fix custom kernel test ( #2510 )  
						
						
						
						
							
						
					 
					
						2025-08-18 06:45:59 -07:00 
						 
				 
			
				
					
						
							
							
								Angelos Katharopoulos 
							
						 
					 
					
						
						
							
						
						1df9887998 
					 
					
						
						
							
							Ensure no oob read in gemv_masked ( #2508 )  
						
						
						
						
							
						
					 
					
						2025-08-17 08:42:33 -07:00