Awni Hannun
ec2ab42888
Lower sorted QMM gather threshold ( #2609 )
2025-09-19 18:22:55 -07:00
Cheng
787c0d90cd
Detect cache thrashing in LRUCache ( #2600 )
...
* Detect cache thrashing in LRUCache
* Do not check cache thrashing in tests
2025-09-19 09:12:14 +09:00
Oleksandr Bilous
e8b604a6a3
fix: library loading for swift dynamic frameworks ( #2568 )
2025-09-18 13:54:59 -07:00
Awni Hannun
50cc09887f
expose depends ( #2606 )
2025-09-18 10:06:15 -07:00
Umberto Mignozzetti
3f730e77aa
Update export function example for array input ( #2598 )
...
After changing the shape to conform (same shapes for all objects), the example works.
2025-09-16 14:38:05 -07:00
Awni Hannun
caecbe876a
no copy batch rope ( #2595 )
2025-09-15 14:23:48 -07:00
Umberto Mignozzetti
8afb6d62f2
Fix typo in average_gradients function call ( #2594 )
2025-09-15 11:29:21 -07:00
Awni Hannun
6ccfa603cd
fix metal scan ( #2591 )
2025-09-15 11:01:57 -07:00
Umberto Mignozzetti
36cad99a11
Refactor code examples to use 'gelu' ( #2592 )
...
Updated code examples to use 'gelu' directly instead of 'nn.gelu'.
2025-09-15 09:47:02 -07:00
Awni Hannun
ee18e1cbf0
patch bump ( #2588 )
v0.29.1
2025-09-11 17:10:09 -07:00
Awni Hannun
af120c2bc0
set nccl ABI version ( #2587 )
2025-09-11 16:55:53 -07:00
Cheng
6a3acf2301
[CUDA] Set bias as input when using bias epilogue ( #2584 )
2025-09-11 15:31:09 +09:00
Awni Hannun
d6977f2a57
Add sdpa with sinks ( #2558 )
...
* add sdpa with sinks
* fix 2 pass
* fix matrix sdpa
* fix perf regression
* add to cuda (#2580 )
2025-09-10 14:53:00 -07:00
Gökdeniz Gülmez
db5443e831
Adding Relu2 ( #2582 )
...
* in. com.
* upd. ackn.
* update __init__
* nits
* nits + format
* used mx.maximum(x, 0) instead of calling the function and moves relu6 under relu2 to make it nicer
* same with _make_activation_module
* Update python/mlx/nn/layers/activations.py
upd
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* update funct.rst
* upd. layers.rst
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2025-09-10 07:24:30 -07:00
Cheng
52b8384d10
Fix flaky addmm tests ( #2581 )
2025-09-10 14:22:22 +09:00
Cheng
44cc5da4bc
[CUDA] Fix alpha not respected when using bias epilogue ( #2578 )
2025-09-10 09:08:01 +09:00
Cheng
dde3682b69
[CUDA] Use GEMM with epilogue instead of AddMM ( #2569 )
2025-09-09 13:18:49 +09:00
Awni Hannun
17310d91a6
Add batch offsets for mx.fast.rope ( #2564 )
...
* implement batch rope for Metal
* cuda rope (#2576 )
2025-09-08 17:35:07 -07:00
Cheng
b194d65a6a
Some tweaks in cmake files ( #2574 )
...
* Do proper check of Metal lib
* Update doctest to get rid of cmake version hack
2025-09-09 08:27:18 +09:00
Cheng
a44b27f5f8
Fix a few ccache cache miss ( #2573 )
...
* Fix ccache cache miss
* Do not define _VERSION_ in python bindings
2025-09-09 07:41:05 +09:00
Awni Hannun
e5a33f2223
faster depthwise 1D conv ( #2567 )
2025-09-08 11:37:23 -07:00
Cheng
c1e3340b23
Set ccache size before building ( #2570 )
2025-09-07 09:00:31 +09:00
XXXXRT666
8f163a367d
typing: add type hints to mlx.core.array, linalg, distributed, and random ( #2565 )
...
* Add type annotations to mlx methods
* Missing list_or_scalar
2025-09-04 09:08:11 -07:00
Manuel Villanueva
89a3df9014
Fixed several type annotations in the MLX stubs which degraded to Unknown/Any ( #2560 )
...
* Added scalar to stubs to fix Unkown Type Hint
### Proposed changes
Issue #2478 reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias.
This PR updates the stub generation patterns to:
• Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available.
• Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any.
• Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules.
With these changes, functions like mlx.add now display rich type signatures such as:
```
def add(
a: scalar | array,
b: scalar | array,
stream: Stream | Device | None = None
) -> array
```
instead of degrading to Any.
### Checklist
• I have read the CONTRIBUTING document
• I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
• I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only)
• I have updated the necessary documentation (if needed)
* add bool to patterns
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-03 12:52:08 -07:00
Krishi Saripalli
c5d2937aa5
chore: Update Docs With Slice Copy Example ( #2559 )
...
* chore: updated docs with slice copy example
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-02 22:07:02 -07:00
Awni Hannun
b61a65e313
fix copies in sdpa ( #2563 )
2025-09-02 11:00:36 -07:00
wrmsr
04cbb4191c
Fix dequantize python sig ( #2562 )
2025-09-01 11:50:20 -07:00
Artur Antonov
c5460762e7
Fix AdamW weight_decay default value in docstring ( #2557 )
2025-08-31 21:29:30 -07:00
Awni Hannun
8ce49cd39e
fix quantized vjp for mxfp4 ( #2555 )
v0.29.0
2025-08-29 10:06:15 -07:00
Awni Hannun
9c68b50853
version bump ( #2554 )
2025-08-29 06:54:17 -07:00
Awni Hannun
111f1e71af
Faster contiguous gather for indices in the first axis ( #2552 )
...
* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases
2025-08-28 21:26:30 -07:00
Awni Hannun
827003d568
fix METAL quantization in JIT ( #2553 )
2025-08-28 18:26:25 -07:00
Awni Hannun
d363a76aa4
Bump xcode in circle ( #2551 )
...
* bump xcode in circle
* bump xcode in circle
* bump xcode in circle
2025-08-28 13:13:34 -07:00
Awni Hannun
70560b6bd5
Add mode parameter for quantization ( #2499 )
...
* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum
2025-08-28 06:45:26 -07:00
Awni Hannun
7ef8a6f2d5
[CUDA] fix sort ( #2550 )
...
* [CUDA] fix sort
* fix test
2025-08-27 19:48:43 -07:00
Cheng
31c6f6e33f
[CUDA] Use ConcurrentContext in concatenate_gpu ( #2549 )
2025-08-28 09:30:08 +09:00
Awni Hannun
584d48458e
link with nccl ( #2546 )
2025-08-27 10:01:07 -07:00
Cheng
5cf984ca87
Separate cpu compilation cache by versions ( #2548 )
2025-08-27 11:25:15 +09:00
Cheng
a9bac3d9e5
Run CPP tests for CUDA build in CI ( #2544 )
2025-08-27 08:06:46 +09:00
Awni Hannun
5458d43247
add load with path tests ( #2543 )
2025-08-26 14:24:47 -07:00
Awni Hannun
a4dba65220
Enable cuda graph toggle ( #2545 )
...
* enable cuda graph toggle
* increase cache size
2025-08-26 12:50:38 -07:00
Awni Hannun
3dcb286baf
Remove stream from average grads so it uses default ( #2532 )
...
* Remove stream from average grads so it uses default
* comment
2025-08-25 15:56:29 -07:00
Cheng
4822c3dbe9
[CUDA] Implement DynamicSlice/DynamicSliceUpdate ( #2533 )
...
* Move DynamicSlice to gpu/primitives
* Implement compute_dynamic_offset in CUDA
2025-08-26 07:31:39 +09:00
Awni Hannun
2ca75bb529
Remove nccl install in release ( #2542 )
2025-08-25 15:20:18 -07:00
Awni Hannun
db14e29a0b
allow pathlib.Path to save/load functions ( #2541 )
2025-08-25 14:58:49 -07:00
Awni Hannun
d2f540f4e0
Use nccl header only when nccl is not present ( #2539 )
...
* use nccl header only when nccl is not present
* larger machine for cuda build
2025-08-25 14:17:25 -07:00
Cheng
333ffea273
[CUDA] Remove thrust in arange ( #2535 )
2025-08-24 16:22:36 +09:00
Cheng
f55b6f1f2f
Enable COMPILE_WARNING_AS_ERROR for linux builds in CI ( #2534 )
2025-08-24 15:33:08 +09:00
Awni Hannun
30561229c7
Fix allocation bug in NCCL ( #2530 )
2025-08-22 14:39:43 -07:00
Awni Hannun
068a4612e9
nccl default for backend=any ( #2528 )
...
* nccl default for backend=any
* check num gpus + ensure row contiguous for all reduce
* comment
2025-08-22 12:24:27 -07:00