Awni Hannun
c076794a22
implement batch rope for Metal
2025-09-06 08:40:27 -07:00
XXXXRT666
8f163a367d
typing: add type hints to mlx.core.array, linalg, distributed, and random ( #2565 )
...
* Add type annotations to mlx methods
* Missing list_or_scalar
2025-09-04 09:08:11 -07:00
Manuel Villanueva
89a3df9014
Fixed several type annotations in the MLX stubs which degraded to Unknown/Any ( #2560 )
...
* Added scalar to stubs to fix Unkown Type Hint
### Proposed changes
Issue #2478 reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias.
This PR updates the stub generation patterns to:
• Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available.
• Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any.
• Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules.
With these changes, functions like mlx.add now display rich type signatures such as:
```
def add(
a: scalar | array,
b: scalar | array,
stream: Stream | Device | None = None
) -> array
```
instead of degrading to Any.
### Checklist
• I have read the CONTRIBUTING document
• I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
• I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only)
• I have updated the necessary documentation (if needed)
* add bool to patterns
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-03 12:52:08 -07:00
Krishi Saripalli
c5d2937aa5
chore: Update Docs With Slice Copy Example ( #2559 )
...
* chore: updated docs with slice copy example
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-02 22:07:02 -07:00
Awni Hannun
b61a65e313
fix copies in sdpa ( #2563 )
2025-09-02 11:00:36 -07:00
wrmsr
04cbb4191c
Fix dequantize python sig ( #2562 )
2025-09-01 11:50:20 -07:00
Artur Antonov
c5460762e7
Fix AdamW weight_decay default value in docstring ( #2557 )
2025-08-31 21:29:30 -07:00
Awni Hannun
8ce49cd39e
fix quantized vjp for mxfp4 ( #2555 )
v0.29.0
2025-08-29 10:06:15 -07:00
Awni Hannun
9c68b50853
version bump ( #2554 )
2025-08-29 06:54:17 -07:00
Awni Hannun
111f1e71af
Faster contiguous gather for indices in the first axis ( #2552 )
...
* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases
2025-08-28 21:26:30 -07:00
Awni Hannun
827003d568
fix METAL quantization in JIT ( #2553 )
2025-08-28 18:26:25 -07:00
Awni Hannun
d363a76aa4
Bump xcode in circle ( #2551 )
...
* bump xcode in circle
* bump xcode in circle
* bump xcode in circle
2025-08-28 13:13:34 -07:00
Awni Hannun
70560b6bd5
Add mode parameter for quantization ( #2499 )
...
* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum
2025-08-28 06:45:26 -07:00
Awni Hannun
7ef8a6f2d5
[CUDA] fix sort ( #2550 )
...
* [CUDA] fix sort
* fix test
2025-08-27 19:48:43 -07:00
Cheng
31c6f6e33f
[CUDA] Use ConcurrentContext in concatenate_gpu ( #2549 )
2025-08-28 09:30:08 +09:00
Awni Hannun
584d48458e
link with nccl ( #2546 )
2025-08-27 10:01:07 -07:00
Cheng
5cf984ca87
Separate cpu compilation cache by versions ( #2548 )
2025-08-27 11:25:15 +09:00
Cheng
a9bac3d9e5
Run CPP tests for CUDA build in CI ( #2544 )
2025-08-27 08:06:46 +09:00
Awni Hannun
5458d43247
add load with path tests ( #2543 )
2025-08-26 14:24:47 -07:00
Awni Hannun
a4dba65220
Enable cuda graph toggle ( #2545 )
...
* enable cuda graph toggle
* increase cache size
2025-08-26 12:50:38 -07:00
Awni Hannun
3dcb286baf
Remove stream from average grads so it uses default ( #2532 )
...
* Remove stream from average grads so it uses default
* comment
2025-08-25 15:56:29 -07:00
Cheng
4822c3dbe9
[CUDA] Implement DynamicSlice/DynamicSliceUpdate ( #2533 )
...
* Move DynamicSlice to gpu/primitives
* Implement compute_dynamic_offset in CUDA
2025-08-26 07:31:39 +09:00
Awni Hannun
2ca75bb529
Remove nccl install in release ( #2542 )
2025-08-25 15:20:18 -07:00
Awni Hannun
db14e29a0b
allow pathlib.Path to save/load functions ( #2541 )
2025-08-25 14:58:49 -07:00
Awni Hannun
d2f540f4e0
Use nccl header only when nccl is not present ( #2539 )
...
* use nccl header only when nccl is not present
* larger machine for cuda build
2025-08-25 14:17:25 -07:00
Cheng
333ffea273
[CUDA] Remove thrust in arange ( #2535 )
2025-08-24 16:22:36 +09:00
Cheng
f55b6f1f2f
Enable COMPILE_WARNING_AS_ERROR for linux builds in CI ( #2534 )
2025-08-24 15:33:08 +09:00
Awni Hannun
30561229c7
Fix allocation bug in NCCL ( #2530 )
2025-08-22 14:39:43 -07:00
Awni Hannun
068a4612e9
nccl default for backend=any ( #2528 )
...
* nccl default for backend=any
* check num gpus + ensure row contiguous for all reduce
* comment
2025-08-22 12:24:27 -07:00
Andrey Portnoy
5722c147de
[CUDA] Update calls to cudaMemAdvise
and cudaGraphAddDependencies
for CUDA 13 ( #2525 )
...
* [CUDA] Update cudaMemAdvise and cudaGraphAddDependencies for CUDA 13
These functions' signatures changed in CUDA 13, so we differentiate
between CUDA 13 and preceding releases at compile time.
* Mention NVIDIA in ACKNOWLEDGMENTS.md
2025-08-21 19:57:20 -07:00
Cheng
f6819a1f26
Fix warning 186-D from nvcc ( #2527 )
2025-08-22 10:29:55 +09:00
Awni Hannun
f93f87c802
nccl dep + default for cuda ( #2526 )
2025-08-21 17:57:49 -07:00
Anastasiia Filippova
9392fc3f88
NCCL backend ( #2476 )
2025-08-21 11:56:15 -07:00
Awni Hannun
e843c4d8d5
fix power ( #2523 )
2025-08-21 06:46:01 -07:00
Angelos Katharopoulos
0c5fc63a36
Fix docs omission ( #2524 )
2025-08-20 17:56:06 -07:00
Angelos Katharopoulos
e397177f6e
Custom cuda kernel ( #2517 )
2025-08-20 17:20:22 -07:00
Cheng
f4c8888cbe
[CUDA] Fix stride of singleton dims before passing to cuDNN ( #2521 )
2025-08-21 08:55:26 +09:00
Angelos Katharopoulos
25c1e03205
Fix overflow in large filter small channels ( #2520 )
2025-08-20 08:03:29 -07:00
russellizadi
512281781c
Remove state return from function example in compile documentation ( #2518 )
2025-08-20 00:45:05 -07:00
Cheng
ac85ddfdb7
[CUDA] Add GEMM-based fallback convolution kernels ( #2511 )
...
* Add gemm_conv
* Add gemm_grouped_conv
2025-08-20 10:06:22 +09:00
Cheng
65d0d40232
Split cuDNN helpers into a separate header ( #2491 )
...
* Add RAII managed CudaGraph class
* Implement forward rms_norm with cuDNN
* Revert back to old rms norm kernel
2025-08-20 09:29:28 +09:00
Awni Hannun
cea9369610
fix lapack svd ( #2515 )
2025-08-18 15:07:59 -07:00
Awni Hannun
e7c6e1db82
no segfault with uninitialized array.at ( #2514 )
2025-08-18 08:33:38 -07:00
Awni Hannun
c5fcd5b61b
fix custom kernel test ( #2510 )
2025-08-18 06:45:59 -07:00
Angelos Katharopoulos
1df9887998
Ensure no oob read in gemv_masked ( #2508 )
2025-08-17 08:42:33 -07:00
Angelos Katharopoulos
73f22d6226
Ensure small sort doesn't use indices if not argsort ( #2506 )
2025-08-17 08:42:20 -07:00
Cheng
c422050ca7
Update cuDNN Frontend to v1.14 ( #2505 )
2025-08-17 19:13:01 +09:00
Cheng
1ba18ff7d9
[CUDA] Fix conv grads with groups ( #2495 )
...
* Put reshape utils in one file
* [CUDA] Fix conv grads with groups
* Put the reshape utils in gpu/copy.h
2025-08-16 10:09:18 +09:00
Cheng
37b440faa8
Clean up code handling both std::vector and SmallVector ( #2493 )
2025-08-16 09:01:10 +09:00
Cheng
888b13ed63
Remove the hack around SmallVector in cpu compile ( #2494 )
2025-08-16 08:17:24 +09:00