Angelos Katharopoulos
dd91ee9534
Refactoring launcher
2025-12-08 02:57:50 -08:00
Angelos Katharopoulos
8fab4f0929
Change the name to a fun pun
2025-12-04 14:20:52 -08:00
Angelos Katharopoulos
47af2c8cb0
Add headers for gcc
2025-12-04 14:20:52 -08:00
Angelos Katharopoulos
f40152ebc1
Expose per-backend availability in C++ and python
2025-12-04 14:20:52 -08:00
Angelos Katharopoulos
5d7e6a0642
Add a no_ibv
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
b9b78b1059
Add empty sum_scatter
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
45727b0c02
Add send/recv
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
2444fbdfe9
Make sure that there is space for work completions
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
f3b605e53c
Add working reduce and semi-working all gather
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
0388ae3aaf
Fix ring
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
d4c1de4a8b
Fix side channel initialization for more than 2 peers
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
4dbffb3954
All gather
2025-12-04 14:20:51 -08:00
Angelos Katharopoulos
b1a60b2d2d
Initial working all reduce
2025-12-04 14:20:51 -08:00
Awni Hannun
a6d6717181
fix compile copying ( #2871 )
2025-12-04 12:32:56 -08:00
Awni Hannun
941cfe23d7
Layer norm throws on dimension mismatch ( #2870 )
2025-12-04 11:21:05 -08:00
romanoneg
9abb0b8123
Added support for pytree types that inherit from tuple and typing.namedtuple ( #2845 )
2025-12-04 11:06:45 -08:00
Tian En "TianHeng
50d3914c67
Update gumbel function signature parameters ( #2868 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-12-03 15:37:35 -08:00
Awni Hannun
cacbdbf995
Fix init from double ( #2861 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
2025-12-03 06:08:11 -08:00
Awni Hannun
193cdcd81a
Fix graph updating ( #2857 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
2025-12-02 17:12:24 -08:00
Awni Hannun
d8ceae7b77
Reduce JVP ( #2854 )
2025-12-02 16:17:47 -08:00
Awni Hannun
eff0e31f00
Fix export scatters ( #2852 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-12-02 11:24:40 -08:00
Awni Hannun
6c5785bc2f
use thread local cpature mode ( #2850 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
2025-12-01 19:02:47 -08:00
CCYeh
8879ee00eb
Support more Numpy interfaces for masked_scatter ( #2832 )
2025-12-01 17:51:02 -08:00
Cheng
6e762fe2e2
[CUDA] Migrate conv code to new cuDNN APIs ( #2847 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-12-02 07:55:43 +09:00
Cheng
2b95d0c270
[CUDA] Use cuDNN attention when T_q != T_kv ( #2843 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
2025-11-27 09:58:43 +09:00
Chaoran Yu
b054838780
Added clarification to apply_fn parameter of apply_to_modules ( #2831 )
...
Co-authored-by: Awni Hannun <awni@apple.com >
2025-11-26 15:40:56 -08:00
Awni Hannun
dd79d3c465
[CUDA] Faster rms norm for small dimension ( #2838 )
2025-11-26 15:10:41 -08:00
Cheng
704fd1ae28
[CUDA] Support array mask in SDPA ( #2822 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-11-26 11:08:58 +09:00
Cheng
c9f4dc851f
Merge build-cuda and build-linux actions ( #2783 )
Build and Test / Check Lint (push) Has been cancelled
Build and Test / Linux (cpu, aarch64) (push) Has been cancelled
Build and Test / Linux (cpu, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, aarch64) (push) Has been cancelled
Build and Test / Linux (cuda-12.6, x86_64) (push) Has been cancelled
Build and Test / Linux (cuda-12.9, x86_64) (push) Has been cancelled
Build and Test / macOS (14.0) (push) Has been cancelled
Build and Test / macOS (15.0) (push) Has been cancelled
Build and Test / Build Documentation (push) Has been cancelled
Build and Test / Linux Fedora (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora (x86_64) (push) Has been cancelled
2025-11-25 20:06:42 +09:00
Cheng
f8bd675655
[CUDA] Output of SDPA should have same layout with inputs ( #2826 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-25 15:22:58 +09:00
Cheng
23a9168d34
[CUDA] Add debug env to save cuda graphs to dot files ( #2825 )
2025-11-25 15:22:36 +09:00
Awni Hannun
bca205e287
[CUDA] Exit on crash and more helpful errors ( #2830 )
2025-11-24 19:46:03 -08:00
CCYeh
1d4eacb737
Fix mx.core.linspace type annotation ( #2820 )
...
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2025-11-24 14:15:08 -08:00
dependabot[bot]
8abd37ad05
Bump actions/checkout from 5 to 6 ( #2828 )
...
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 06:04:46 -08:00
Andrey Portnoy
3e05cea9f8
Force cudaGraphExec reinstantiation when clusters are used ( #2813 )
...
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Co-authored-by: Awni Hannun <awni@apple.com >
2025-11-22 12:43:49 -08:00
CCYeh
5b0f047226
Fix mx.core.load type annotation ( #2819 )
2025-11-22 11:09:44 -08:00
Harsh Sutaria
618c87af8c
Add float64 Eig and complex64 SVD/Eig support ( Fixes #2708 ) ( #2737 )
...
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com >
2025-11-22 06:51:36 -08:00
Cheng
d5f61a93fa
Fix typo: refs/head/main => refs/heads/main ( #2818 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-22 09:43:35 +09:00
Awni Hannun
4a09264236
Tolerance for some ops tests on cuda ( #2815 )
2025-11-21 16:06:16 -08:00
Awni Hannun
0dbc7e5bee
Centralize NAX condition ( #2811 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-21 13:28:15 -08:00
Awni Hannun
0d68efd461
patch bump for future version ( #2804 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-11-20 09:26:20 -08:00
Awni Hannun
f9e1a14135
[CUDA] Partly fix random for large sizes ( #2798 )
2025-11-20 07:27:50 -08:00
Awni Hannun
d8e9ded928
Fix cuda allocator copy condition ( #2800 )
2025-11-20 07:06:55 -08:00
Awni Hannun
60939d010c
Fix macos release target and linux arm release ( #2802 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-11-19 21:37:50 -08:00
Awni Hannun
fdcd2923fd
patch + fix docs build ( #2799 )
2025-11-19 16:16:26 -08:00
Jagrit Digani
54f1cc6e3e
Add Neural Accelerator Support ( #2772 )
2025-11-19 15:06:00 -08:00
CCYeh
b3825ac149
Add Masked Scatter ( #2663 )
...
Co-authored-by: Awni Hannun <awni@apple.com >
Co-authored-by: Angelos Katharopoulos <katharas@gmail.com >
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-11-19 14:53:32 -08:00
Awni Hannun
7f4b7e553c
version ( #2797 )
2025-11-19 14:11:16 -08:00
Awni Hannun
ad16f41a7f
Fix version tag ( #2790 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-19 08:55:57 -08:00
Awni Hannun
f46877bc08
more accurate rope fallback ( #2792 )
2025-11-19 06:07:21 -08:00
Cheng
6f35017d1b
[CUDA] cuDNN backward attention ( #2762 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-11-19 08:13:50 +09:00
Awni Hannun
b167f0df1c
build docs on linux ( #2787 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-18 08:01:03 -08:00
Cheng
a9f0d6b160
Avoid duplicate CI runs when starting a PR from upstream branch ( #2788 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-18 15:16:25 +09:00
Cheng
940f4c7818
Fix building with CUDA < 12.8 ( #2782 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.6) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-11-18 12:55:19 +09:00
Cheng
35f81728f1
Remove unneeded tests in nightly build ( #2786 )
2025-11-18 08:09:58 +09:00
Cheng
4442ed86c1
Fix nightly build ( #2785 )
2025-11-18 08:07:51 +09:00
Cheng
698559c231
Test every commit in main branch ( #2781 )
2025-11-18 08:07:22 +09:00
Cheng
ecc4879b07
Do not run CPU tests in CUDA builds ( #2784 )
2025-11-18 07:27:09 +09:00
Cheng
32b18d8b66
Use std::optional for mask_arr arg ( #2763 )
Build and Test / check_lint (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04) (push) Has been cancelled
Build and Test / linux_build_and_test (ubuntu-22.04-arm) (push) Has been cancelled
Build and Test / mac_build_and_test (14.0) (push) Has been cancelled
Build and Test / mac_build_and_test (15.0) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.8) (push) Has been cancelled
Build and Test / cuda_build_and_test (cuda-12.9) (push) Has been cancelled
Build and Test / build_documentation (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Build and Test / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (cuda-12.8) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (cuda-12.9) (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-17 10:43:33 +09:00
Cheng
472c43a0c8
Build and test with multiple CUDA versions ( #2780 )
2025-11-17 09:19:02 +09:00
Cheng
b7214ff01e
Remove pip cache in GitHub Actions ( #2776 )
...
* Correctly set pip cache key
* [Debug] Try disabling pip cache
2025-11-17 08:19:59 +09:00
Cheng
76414c8971
Run CI for pushes ( #2777 )
2025-11-17 07:19:01 +09:00
Awni Hannun
49e4566df3
fix release 2 ( #2767 )
...
* fix release 2
* login
* fix
2025-11-16 11:39:53 -08:00
Awni Hannun
aad49f932f
[CUDA] Tune ops per buffer based on device ( #2761 )
...
* tune ops per buffer based on device
* tune memory limit as well
* add tuning for spark
2025-11-16 06:29:49 -08:00
Cheng
86765cce34
Use ccache in GitHub Actions ( #2773 )
...
* Remove unnecessary steps
* Use ccache
* Log when using ccache
* Set max-size to 1GB
* Pass --no-build-isolation
* Remove more unused things
2025-11-16 07:58:14 +09:00
Cheng
1bedcbd556
Fix warnings with cmake 4.1 ( #2774 )
2025-11-16 07:12:47 +09:00
Cheng
9ac7dbe877
Fix MPI distributed tests with CUDA backend ( #2775 )
2025-11-16 07:12:18 +09:00
Awni Hannun
1bf605d56d
use arch specific targets when possible ( #2771 )
2025-11-14 20:04:18 -08:00
Cheng
3c622ddd1d
Separate test-linux from build-linux/cuda in GitHub Actions ( #2765 )
...
* Separate test-linux from build-linux/cuda in GitHub Actions
* Prefer unittest when possible
Co-authored-by: Mike Drob <mdrob@apache.org >
---------
Co-authored-by: Mike Drob <mdrob@apache.org >
2025-11-15 11:14:09 +09:00
Awni Hannun
27ff069175
Fix exporting with constants ( #2769 )
2025-11-14 12:52:08 -08:00
Cheng
3b2ffcefc3
[CUDA] cuDNN forward attention ( #2743 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* Separate sdpa kernels in another file
* Initial support for cuDNN SDPA
* Diable a few corner cases
* Remove scaled_dot_product_attention.h
* Use cuDNN attention for prefilling
* cuDNN SDPA requires Ampere and later
* Address reviews
* Do contiguous copy of inputs
2025-11-14 09:23:56 +09:00
Awni Hannun
b65f882df3
fix release ( #2759 )
2025-11-13 15:34:01 -08:00
Cheng
b704e9e77a
[CUDA] Check CUDA error in synchronize ( #2757 )
2025-11-14 07:10:23 +09:00
Awni Hannun
66519fb348
fix slice ( #2758 )
2025-11-13 11:30:02 -08:00
Awni Hannun
8973550ff3
export custom kernel ( #2756 )
2025-11-13 11:29:50 -08:00
Mike Drob
3f866be665
minor debugging for publishing ( #2739 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* minor debugging for publishing
* fix logic
2025-11-12 06:33:39 -08:00
Awni Hannun
23f81ed1c1
Linux on arm ( #2751 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14, ubuntu-22.04-arm) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* try linux on arm
* ssh
* fix
2025-11-11 11:41:14 -08:00
wrmsr
3fe2250c00
Fix irregular_strides benchmark shape type ( #2754 )
2025-11-11 11:40:22 -08:00
Awni Hannun
047114b988
remove circle ( #2753 )
2025-11-11 11:39:47 -08:00
wrmsr
9320eb89a8
Fix dequantize python sig (dtype default) ( #2752 )
2025-11-11 09:55:24 -08:00
Awni Hannun
75819d70ea
patch bump ( #2750 )
2025-11-11 08:49:14 -08:00
Awni Hannun
60d80a3728
fix release builds ( #2746 )
2025-11-11 07:44:30 -08:00
Pedro Cuenca
eba6a9d163
Compatibility with pip-installed openmpi ( #2741 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-07 16:58:31 -08:00
CCYeh
be9e2aebd6
Shapeless support for zeros/ones_like ( #2726 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* shapeless support for zeros/ones_like
* Improvements
* fix access after moved
2025-11-06 19:12:20 -08:00
Awni Hannun
df58b4133a
[CUDA] Reduce use of managed memory ( #2725 )
...
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
* Use async cuda malloc managed with cuda 13
* add pool threshold
* refactor for regular cuda malloc
* load eval gpu for cuda
* remove use of cuda pool, use cuda free async
* fix
* fix
* fix
* fix
* fix + comment
2025-11-05 16:05:23 -08:00
Anastasiia Filippova
27778156dc
Nccl reduce scatter, all gather ( #2727 )
...
* Added reduce scatter and all gather for nccl
* fix unused import, delete unused file
* small fix
* deleted useless condition
* fixed comments
* fix bug in eval_gpu, renamed to sum_scatter, fix docs
* final fix docs
* remove and
* Update mlx/distributed/mpi/mpi.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* fix broken set input output
* fixes set output
* typo
* fix typo
* no cpu, no gpu for reduce scatter
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2025-11-05 08:21:11 -08:00
Mike Drob
761f901a41
fix property name ( #2736 )
2025-11-05 06:31:56 -06:00
Angelos Katharopoulos
6ece97f69b
Make cpu binary_op easily accessible ( #2733 )
2025-11-05 01:08:41 -08:00
Awni Hannun
d3bc6a9bff
don't test when doing release ( #2734 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-04 15:54:23 -08:00
Awni Hannun
26ceb507eb
only build for macos 14 and up ( #2731 )
...
* only build for macos 14 and up
* bump metal cpp
2025-11-04 09:44:15 -08:00
Mike Drob
910b3e3299
skip self-hosted runners on forks ( #2730 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
2025-11-03 16:22:13 -06:00
Harsh Sutaria
50fa315d18
Fix addmm with empty matrices and beta != 1.0 ( #2715 )
2025-11-03 14:16:15 -08:00
AN Long
1ff2b713b6
Check isnan in maximum / minimum with CPU backend ( #2652 )
...
* Check isnan in maximum / minimum with CPU backend
* Add tests
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-11-03 08:51:14 -08:00
Mike Drob
50514a6146
Set up publishing to PyPI and Test-PyPI ( #2721 )
2025-11-03 07:20:11 -08:00
Awni Hannun
93d76b0f30
Fix compile multi capture ( #2678 )
...
* fix compile when compiling multiple lambdas with the same capture
* add test
2025-11-03 06:33:43 -08:00
David Koski
78678de0cd
add null check -- the bundleIdentifier is optional ( #2709 )
...
* add null check -- the bundleIdentifier is optional
* use variable
2025-11-03 06:33:21 -08:00
Melissa Kilby
ed9c6b1117
update: add linux fedora container CI - CPP build test only ( #2722 )
...
* update: add linux_fedora_build_cpp CI - CPP build test only - x86-64
Signed-off-by: Melissa Kilby <mkilby@apple.com >
* update: add linux_fedora_build_cpp_aarch64 CI - CPP build test only - arm64
Co-authored-by: Mike Drob <mdrob@apple.com >
Signed-off-by: Melissa Kilby <mkilby@apple.com >
* update: convert linux_fedora_build_cpp to matrix.arch loop
Co-authored-by: Mike Drob <mdrob@apple.com >
Signed-off-by: Melissa Kilby <mkilby@apple.com >
---------
Signed-off-by: Melissa Kilby <mkilby@apple.com >
Co-authored-by: Mike Drob <mdrob@apple.com >
2025-11-03 06:33:00 -08:00
Awni Hannun
39b04ce638
use faster dequant for fp4 qmv ( #2720 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-10-31 11:49:59 -07:00
Mike Drob
d9e6349657
fix docs path ( #2719 )
2025-10-30 19:12:49 -05:00
Angelos Katharopoulos
b901a9f311
Fix the order of hosts in the ring ( #2718 )
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
2025-10-30 15:02:39 -07:00
Awni Hannun
68c5fa1c95
fix memory count bug ( #2717 )
2025-10-30 14:27:15 -07:00
Christopher Webb
793a31eeb6
Fix missing domain_uuid_key in thunderbolt ring setup ( #2682 )
2025-10-30 13:17:20 -07:00
Mike Drob
74c1ed25bb
Migrate CircleCI to GitHub Actions ( #2716 )
...
Co-authored-by: Joseph Heck <j_heck@apple.com >
2025-10-30 12:26:55 -05:00
Awni Hannun
ec72b44417
Add quantize/dequantize for mxfp8 and nvfp4 ( #2688 )
...
* Add quantize/dequantize slow path for mxfp8 and nvfp4
* fast cuda kernel for mx/nv quantization
* fallback for cuda < 12.8 (#2697 )
* format (#2700 )
* fix (#2701 )
* metal kernels
* docs
* fix jit
* add default bits and group sizes
* improve quant docs
* fix output type of mxfp4 matmuls
2025-10-28 16:23:12 -07:00
Melissa Kilby
460691a0e8
fix: linux-{fedora}x86_64-build ( #2707 )
...
Signed-off-by: Melissa Kilby <mkilby@apple.com >
2025-10-27 16:36:08 -07:00
Awni Hannun
969924cc69
Fp8 conversion ( #2686 )
...
* add fp8 e4m3 converters
* add cuda
* default saturate to min/max
* fix for older OS
* fix no gpu/cpu
* fix saturate
* fix compile
2025-10-27 16:35:50 -07:00
Awni Hannun
d1e06117e8
bump python ( #2694 )
2025-10-27 11:34:31 -07:00
Awni Hannun
539d8322d1
add median op ( #2705 )
2025-10-27 11:33:42 -07:00
Awni Hannun
c4767d110f
fix addmm cpu ( #2699 )
2025-10-27 11:33:32 -07:00
David Koski
895217f25b
optionally load metallib from framework ( #2702 )
...
* optionally load metallib from framework
* pre-commit
* adjust logic
2025-10-27 07:52:03 -07:00
Manuel Villanueva
0cfeeb60ca
Einsum error msg improvement ( #2690 )
...
* Improved error message for Einsum
* Modifications via pre-commit
* format
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-10-27 06:31:47 -07:00
Ronan Collobert
8f8af61a37
fix warnings showing up with -Wall ( #2692 )
2025-10-24 11:43:35 -07:00
Manuel Villanueva
233384161e
Improved mx.split() docs ( #2689 )
...
* Improved mx.split() documentation
* Fix typo in docstring for array split function
* add example
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-10-24 09:48:41 -07:00
Awni Hannun
5bcf3a6794
format
2025-10-22 16:08:47 -07:00
wickedcoder
7707196297
Merge commit from fork
...
* add length validation to the header
* fix accessing out of bound index with .at()
2025-10-22 15:31:25 -07:00
wickedcoder
7e3471c987
Merge commit from fork
...
* add tensor->weights_data validation
* add null pointer check for tensor
2025-10-22 15:31:03 -07:00
Awni Hannun
9f0ba3ddf1
patch bump ( #2680 )
2025-10-17 12:12:07 -07:00
Awni Hannun
4bce5f9b2d
suppress gcc 10.1 warnings ( #2679 )
...
* suppress gcc 10.1 warnings
* suppress gcc 10.1 warnings
2025-10-17 12:09:21 -07:00
Anastasiia Filippova
e9eab527eb
Nccl timeout ( #2673 )
...
* print the error & delete nccl group
* timeout for nccl binding
* typo
* revert error
* fixed a typo
2025-10-14 12:29:54 -07:00
Awni Hannun
36ca62dba8
remove unused unary file ( #2672 )
2025-10-13 19:36:26 -07:00
Manuel Villanueva
9cbb1b0148
Modified sort behavior when running CPU or Metal to match NumPy/JAX ( #2667 )
...
* Modified sort behavior when running CPU or Metal to match NumPy/JAX sorting behavior.
* Modified sort behavior when running CPU or Metal to match NumPy/JAX
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-10-13 14:36:45 -07:00
Fabrizio Milo
9bfc476d72
Normalize README bullet formatting ( #2671 )
2025-10-13 12:13:30 -07:00
Awni Hannun
25e2356316
speed up scalars ( #2669 )
2025-10-13 12:10:15 -07:00
Awni Hannun
226a1d24e0
Debug cuda conv ( #2662 )
...
* use t4
* use t4
2025-10-10 16:12:47 -07:00
Awni Hannun
630350ad3e
Precise sigmoid ( #2659 )
...
* bump patch
* Sigmoid matches PyTorch and is more precise on tails
2025-10-10 10:05:23 -07:00
Awni Hannun
380aeb58ae
enable admm low-precision cpu ( #2661 )
2025-10-10 09:50:54 -07:00
Awni Hannun
f37389d100
bump patch ( #2658 )
2025-10-10 08:36:41 -07:00
Awni Hannun
e89e8b4272
Export with callback ( #2612 )
...
* export with callback
* export with callback
* Add types, fix kwarg ordering bug + test
* cleanup, test, fix
* typos
2025-10-08 19:24:33 -07:00
AN Long
85a8824a8c
Fix cumulative operations when axis=None ( #2653 )
2025-10-08 15:25:38 -07:00
Awni Hannun
f5d4397e5c
Fix fast synch when fence is waited before a command buffer is created ( #2657 )
2025-10-08 11:23:46 -07:00
Awni Hannun
343e33b6d5
fix all_gather vjp ( #2654 )
2025-10-07 06:05:23 -07:00
Angelos Katharopoulos
0073096dd1
Split name into directories for cuda jit ( #2656 )
2025-10-07 01:52:58 -07:00
Angelos Katharopoulos
e3d004fed9
Fix and refactor row-reduce ( #2650 )
2025-10-07 01:51:08 -07:00
Awni Hannun
a393435d28
Speed up compile for node with many parents ( #2649 )
2025-10-03 19:30:36 -07:00
Awni Hannun
a7a94b29d7
Fix compile when outputs change ( #2648 )
2025-10-03 08:40:57 -07:00
Daniel Yeh
22a5da76c8
Faster complex matmul ( #2571 )
2025-10-02 23:33:15 -07:00
Andrey Portnoy
287c63a093
Configure CMake to export compile_commands.json ( #2645 )
...
This helps enable LSP for code navigation using clangd.
2025-10-02 15:40:32 -07:00
Awni Hannun
1c9ae1eaa1
cuda fix flaky test ( #2646 )
2025-10-02 15:40:04 -07:00
Angelos Katharopoulos
c2c3e0b0a2
[CUDA] Add a small column specialization to reduce ( #2642 )
2025-10-02 14:41:05 -07:00
Awni Hannun
b0cc71ae71
Faster triu, tril, where with scalar ( #2644 )
2025-10-02 12:21:27 -07:00
Awni Hannun
e88f2d4a8e
fix cross entropy axis param ( #2641 )
...
* fix cross entropy axis param
* faster grad clipping
2025-10-01 16:49:55 -07:00
Angelos Katharopoulos
9cee557423
Fix status message ( #2638 )
2025-10-01 16:43:45 -07:00
Awni Hannun
bbf1423953
wait for tasks in cuda ( #2636 )
2025-09-30 16:08:46 -07:00
Angelos Katharopoulos
eb24267b56
Compile now can attach arbitrary data to an entry ( #2634 )
2025-09-30 13:33:27 -07:00
Awni Hannun
dc371ae7a5
fix for max block dim ( #2631 )
2025-09-29 08:59:25 -07:00
AN Long
e76a8dd5c5
Fix incorrect path and typos ( #2630 )
2025-09-28 06:03:04 -07:00
Cheng
b466dea982
[CUDA] Make CudaEvent work with multi-device ( #2614 )
...
* Set current device when creating cuda event
* Separate cuda events by device
* Avoid race condition in pool
2025-09-27 11:27:17 +09:00
Angelos Katharopoulos
7a6adda1e6
Bump the version ( #2627 )
2025-09-26 15:15:28 -07:00
Angelos Katharopoulos
1a9f820af6
Compiled should not end in broadcast ( #2622 )
2025-09-26 13:36:09 -07:00
Awni Hannun
d4f4ff3c5e
Allow None input to compiled functions ( #2621 )
...
* Allow None input to compiled functions
* Allow None input to compiled functions
2025-09-25 08:42:23 -07:00
Jagrit Digani
7c7e48dbd1
New tuning for small K gemv ( #2620 )
...
* New tuning for small K gemv
2025-09-23 12:28:35 -07:00
Daniel Yeh
fbbf3b9b3e
Support pickling array for bfloat16 ( #2586 )
...
* add bfloat16 pickling
* Improvements
* improve
---------
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2025-09-22 20:12:15 -07:00
Daniel Yeh
bf01ad9367
fix ( #2613 )
...
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2025-09-22 20:12:04 -07:00
Cheng
ae438d05fa
[CUDA] Recycle CUDA events ( #2604 )
...
* Make CudaEvent a CudaHandle
* Add caching for CudaEvent
* Make sure cuda events are destroyed at last
* Fix headers
* SharedEvent => AtomicEvent
* RawCudaEvent => CudaEventHandle, CudaEventWrapper => CopyableCudaEvent
* Remove unneeded asserts
2025-09-23 10:42:03 +09:00
Awni Hannun
711a645807
avoid producing NaN in attention ( #2608 )
2025-09-22 13:10:43 -07:00
Josh Bleecher Snyder
aa9d44b3d4
implement Convolution::output_shape ( #2601 )
...
- pull conv_out_shape out for re-use
- add Conv::output_shape
- add e2e python tests confirming shapeless=True support and correctness
Updates #2599
2025-09-22 10:09:45 -07:00
Awni Hannun
ec2ab42888
Lower sorted QMM gather threshold ( #2609 )
2025-09-19 18:22:55 -07:00
Cheng
787c0d90cd
Detect cache thrashing in LRUCache ( #2600 )
...
* Detect cache thrashing in LRUCache
* Do not check cache thrashing in tests
2025-09-19 09:12:14 +09:00
Oleksandr Bilous
e8b604a6a3
fix: library loading for swift dynamic frameworks ( #2568 )
2025-09-18 13:54:59 -07:00
Awni Hannun
50cc09887f
expose depends ( #2606 )
2025-09-18 10:06:15 -07:00
Umberto Mignozzetti
3f730e77aa
Update export function example for array input ( #2598 )
...
After changing the shape to conform (same shapes for all objects), the example works.
2025-09-16 14:38:05 -07:00
Awni Hannun
caecbe876a
no copy batch rope ( #2595 )
2025-09-15 14:23:48 -07:00
Umberto Mignozzetti
8afb6d62f2
Fix typo in average_gradients function call ( #2594 )
2025-09-15 11:29:21 -07:00
Awni Hannun
6ccfa603cd
fix metal scan ( #2591 )
2025-09-15 11:01:57 -07:00
Umberto Mignozzetti
36cad99a11
Refactor code examples to use 'gelu' ( #2592 )
...
Updated code examples to use 'gelu' directly instead of 'nn.gelu'.
2025-09-15 09:47:02 -07:00
Awni Hannun
ee18e1cbf0
patch bump ( #2588 )
2025-09-11 17:10:09 -07:00
Awni Hannun
af120c2bc0
set nccl ABI version ( #2587 )
2025-09-11 16:55:53 -07:00
Cheng
6a3acf2301
[CUDA] Set bias as input when using bias epilogue ( #2584 )
2025-09-11 15:31:09 +09:00
Awni Hannun
d6977f2a57
Add sdpa with sinks ( #2558 )
...
* add sdpa with sinks
* fix 2 pass
* fix matrix sdpa
* fix perf regression
* add to cuda (#2580 )
2025-09-10 14:53:00 -07:00
Gökdeniz Gülmez
db5443e831
Adding Relu2 ( #2582 )
...
* in. com.
* upd. ackn.
* update __init__
* nits
* nits + format
* used mx.maximum(x, 0) instead of calling the function and moves relu6 under relu2 to make it nicer
* same with _make_activation_module
* Update python/mlx/nn/layers/activations.py
upd
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* update funct.rst
* upd. layers.rst
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2025-09-10 07:24:30 -07:00
Cheng
52b8384d10
Fix flaky addmm tests ( #2581 )
2025-09-10 14:22:22 +09:00
Cheng
44cc5da4bc
[CUDA] Fix alpha not respected when using bias epilogue ( #2578 )
2025-09-10 09:08:01 +09:00
Cheng
dde3682b69
[CUDA] Use GEMM with epilogue instead of AddMM ( #2569 )
2025-09-09 13:18:49 +09:00
Awni Hannun
17310d91a6
Add batch offsets for mx.fast.rope ( #2564 )
...
* implement batch rope for Metal
* cuda rope (#2576 )
2025-09-08 17:35:07 -07:00
Cheng
b194d65a6a
Some tweaks in cmake files ( #2574 )
...
* Do proper check of Metal lib
* Update doctest to get rid of cmake version hack
2025-09-09 08:27:18 +09:00
Cheng
a44b27f5f8
Fix a few ccache cache miss ( #2573 )
...
* Fix ccache cache miss
* Do not define _VERSION_ in python bindings
2025-09-09 07:41:05 +09:00
Awni Hannun
e5a33f2223
faster depthwise 1D conv ( #2567 )
2025-09-08 11:37:23 -07:00
Cheng
c1e3340b23
Set ccache size before building ( #2570 )
2025-09-07 09:00:31 +09:00
XXXXRT666
8f163a367d
typing: add type hints to mlx.core.array, linalg, distributed, and random ( #2565 )
...
* Add type annotations to mlx methods
* Missing list_or_scalar
2025-09-04 09:08:11 -07:00
Manuel Villanueva
89a3df9014
Fixed several type annotations in the MLX stubs which degraded to Unknown/Any ( #2560 )
...
* Added scalar to stubs to fix Unkown Type Hint
### Proposed changes
Issue #2478 reports that several type annotations in the MLX stubs degrade to Unknown/Any in editors like VS Code with Pylance, due to missing imports (Union, Optional, Tuple) and an undefined scalar type alias.
This PR updates the stub generation patterns to:
• Add missing typing imports in mlx.core.__prefix__ so that Union, Optional, Tuple, etc. are always available.
• Define and export scalar: TypeAlias = Union[int, float, bool] in mlx.core.__suffix__ so that functions typed with Union[scalar, array] resolve correctly instead of falling back to Any.
• Update submodule stub prefixes (distributed, fast, linalg, metal, random) to import scalar alongside array, Device, and Stream, ensuring type checkers resolve the union consistently across modules.
With these changes, functions like mlx.add now display rich type signatures such as:
```
def add(
a: scalar | array,
b: scalar | array,
stream: Stream | Device | None = None
) -> array
```
instead of degrading to Any.
### Checklist
• I have read the CONTRIBUTING document
• I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
• I have added tests that prove my fix is effective or that my feature works (n/a — stub generation only)
• I have updated the necessary documentation (if needed)
* add bool to patterns
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-03 12:52:08 -07:00
Krishi Saripalli
c5d2937aa5
chore: Update Docs With Slice Copy Example ( #2559 )
...
* chore: updated docs with slice copy example
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-09-02 22:07:02 -07:00
Awni Hannun
b61a65e313
fix copies in sdpa ( #2563 )
2025-09-02 11:00:36 -07:00
wrmsr
04cbb4191c
Fix dequantize python sig ( #2562 )
2025-09-01 11:50:20 -07:00
Artur Antonov
c5460762e7
Fix AdamW weight_decay default value in docstring ( #2557 )
2025-08-31 21:29:30 -07:00
Awni Hannun
8ce49cd39e
fix quantized vjp for mxfp4 ( #2555 )
2025-08-29 10:06:15 -07:00
Awni Hannun
9c68b50853
version bump ( #2554 )
2025-08-29 06:54:17 -07:00
Awni Hannun
111f1e71af
Faster contiguous gather for indices in the first axis ( #2552 )
...
* faster contiguous gather for indices in the first axis
* work per thread > 1
* angelos suggestion for scales / biases
2025-08-28 21:26:30 -07:00
Awni Hannun
827003d568
fix METAL quantization in JIT ( #2553 )
2025-08-28 18:26:25 -07:00
Awni Hannun
d363a76aa4
Bump xcode in circle ( #2551 )
...
* bump xcode in circle
* bump xcode in circle
* bump xcode in circle
2025-08-28 13:13:34 -07:00
Awni Hannun
70560b6bd5
Add mode parameter for quantization ( #2499 )
...
* add mode parameter for quantization
* mxfp4 quantize/dequantize + start of optional biases
* mxfp4 works
* speedup
* cpu mxfp4
* fix
* fix test tol
* fix
* refactor
* add quant mode enum
2025-08-28 06:45:26 -07:00
Awni Hannun
7ef8a6f2d5
[CUDA] fix sort ( #2550 )
...
* [CUDA] fix sort
* fix test
2025-08-27 19:48:43 -07:00
Cheng
31c6f6e33f
[CUDA] Use ConcurrentContext in concatenate_gpu ( #2549 )
2025-08-28 09:30:08 +09:00
Awni Hannun
584d48458e
link with nccl ( #2546 )
2025-08-27 10:01:07 -07:00
Cheng
5cf984ca87
Separate cpu compilation cache by versions ( #2548 )
2025-08-27 11:25:15 +09:00
Cheng
a9bac3d9e5
Run CPP tests for CUDA build in CI ( #2544 )
2025-08-27 08:06:46 +09:00
Awni Hannun
5458d43247
add load with path tests ( #2543 )
2025-08-26 14:24:47 -07:00
Awni Hannun
a4dba65220
Enable cuda graph toggle ( #2545 )
...
* enable cuda graph toggle
* increase cache size
2025-08-26 12:50:38 -07:00
Awni Hannun
3dcb286baf
Remove stream from average grads so it uses default ( #2532 )
...
* Remove stream from average grads so it uses default
* comment
2025-08-25 15:56:29 -07:00
Cheng
4822c3dbe9
[CUDA] Implement DynamicSlice/DynamicSliceUpdate ( #2533 )
...
* Move DynamicSlice to gpu/primitives
* Implement compute_dynamic_offset in CUDA
2025-08-26 07:31:39 +09:00
Awni Hannun
2ca75bb529
Remove nccl install in release ( #2542 )
2025-08-25 15:20:18 -07:00
Awni Hannun
db14e29a0b
allow pathlib.Path to save/load functions ( #2541 )
2025-08-25 14:58:49 -07:00
Awni Hannun
d2f540f4e0
Use nccl header only when nccl is not present ( #2539 )
...
* use nccl header only when nccl is not present
* larger machine for cuda build
2025-08-25 14:17:25 -07:00
Cheng
333ffea273
[CUDA] Remove thrust in arange ( #2535 )
2025-08-24 16:22:36 +09:00
Cheng
f55b6f1f2f
Enable COMPILE_WARNING_AS_ERROR for linux builds in CI ( #2534 )
2025-08-24 15:33:08 +09:00
Awni Hannun
30561229c7
Fix allocation bug in NCCL ( #2530 )
2025-08-22 14:39:43 -07:00
Awni Hannun
068a4612e9
nccl default for backend=any ( #2528 )
...
* nccl default for backend=any
* check num gpus + ensure row contiguous for all reduce
* comment
2025-08-22 12:24:27 -07:00
Andrey Portnoy
5722c147de
[CUDA] Update calls to cudaMemAdvise and cudaGraphAddDependencies for CUDA 13 ( #2525 )
...
* [CUDA] Update cudaMemAdvise and cudaGraphAddDependencies for CUDA 13
These functions' signatures changed in CUDA 13, so we differentiate
between CUDA 13 and preceding releases at compile time.
* Mention NVIDIA in ACKNOWLEDGMENTS.md
2025-08-21 19:57:20 -07:00
Cheng
f6819a1f26
Fix warning 186-D from nvcc ( #2527 )
2025-08-22 10:29:55 +09:00
Awni Hannun
f93f87c802
nccl dep + default for cuda ( #2526 )
2025-08-21 17:57:49 -07:00
Anastasiia Filippova
9392fc3f88
NCCL backend ( #2476 )
2025-08-21 11:56:15 -07:00
Awni Hannun
e843c4d8d5
fix power ( #2523 )
2025-08-21 06:46:01 -07:00
Angelos Katharopoulos
0c5fc63a36
Fix docs omission ( #2524 )
2025-08-20 17:56:06 -07:00
Angelos Katharopoulos
e397177f6e
Custom cuda kernel ( #2517 )
2025-08-20 17:20:22 -07:00
Cheng
f4c8888cbe
[CUDA] Fix stride of singleton dims before passing to cuDNN ( #2521 )
2025-08-21 08:55:26 +09:00
Angelos Katharopoulos
25c1e03205
Fix overflow in large filter small channels ( #2520 )
2025-08-20 08:03:29 -07:00
russellizadi
512281781c
Remove state return from function example in compile documentation ( #2518 )
2025-08-20 00:45:05 -07:00
Cheng
ac85ddfdb7
[CUDA] Add GEMM-based fallback convolution kernels ( #2511 )
...
* Add gemm_conv
* Add gemm_grouped_conv
2025-08-20 10:06:22 +09:00
Cheng
65d0d40232
Split cuDNN helpers into a separate header ( #2491 )
...
* Add RAII managed CudaGraph class
* Implement forward rms_norm with cuDNN
* Revert back to old rms norm kernel
2025-08-20 09:29:28 +09:00
Awni Hannun
cea9369610
fix lapack svd ( #2515 )
2025-08-18 15:07:59 -07:00
Awni Hannun
e7c6e1db82
no segfault with uninitialized array.at ( #2514 )
2025-08-18 08:33:38 -07:00
Awni Hannun
c5fcd5b61b
fix custom kernel test ( #2510 )
2025-08-18 06:45:59 -07:00
Angelos Katharopoulos
1df9887998
Ensure no oob read in gemv_masked ( #2508 )
2025-08-17 08:42:33 -07:00
Angelos Katharopoulos
73f22d6226
Ensure small sort doesn't use indices if not argsort ( #2506 )
2025-08-17 08:42:20 -07:00
Cheng
c422050ca7
Update cuDNN Frontend to v1.14 ( #2505 )
2025-08-17 19:13:01 +09:00
Cheng
1ba18ff7d9
[CUDA] Fix conv grads with groups ( #2495 )
...
* Put reshape utils in one file
* [CUDA] Fix conv grads with groups
* Put the reshape utils in gpu/copy.h
2025-08-16 10:09:18 +09:00
Cheng
37b440faa8
Clean up code handling both std::vector and SmallVector ( #2493 )
2025-08-16 09:01:10 +09:00
Cheng
888b13ed63
Remove the hack around SmallVector in cpu compile ( #2494 )
2025-08-16 08:17:24 +09:00
Cheng
4abb218d21
The naive_conv_2d is no longer used ( #2496 )
2025-08-16 07:57:30 +09:00
Awni Hannun
6441c21a94
Faster general unary op ( #2472 )
...
* faster general unary op
* faster general ops + reorg
* fix + comment
* binary two
* copy general
2025-08-15 15:04:12 -07:00
Cheng
dfb5022eab
Rename cu::Matmul to CublasGemm ( #2488 )
2025-08-13 09:37:40 +09:00
Daniel Yeh
ac207ce7aa
make code blocks copyable ( #2480 )
...
Co-authored-by: Chen-Chen Yeh <ge96noj@mytum.de >
2025-08-12 12:29:02 -07:00
Abe Leininger
fce53b61d6
Fix reduce sum/prod overflow ( #2477 )
2025-08-12 00:05:33 -07:00
Angelos Katharopoulos
8ae4a76308
Use CMake <4.1 to avoid the nvpl error ( #2489 )
2025-08-12 00:03:42 -07:00
Cheng
7fde1b6a1e
Fix logsumexp/softmax not fused for some cases ( #2474 )
2025-08-08 14:07:17 -07:00
Cheng
aa7b47481a
[CUDA] Optimize set_mm_device_pointers for small ndim ( #2473 )
2025-08-08 15:23:30 +09:00
Awni Hannun
56be773610
version ( #2470 )
2025-08-07 00:36:04 -07:00
Jagrit Digani
a9bdd67baa
Add CUDA sdpa vector ( #2468 )
2025-08-06 21:40:26 -07:00
Angelos Katharopoulos
f2adb5638d
Fix typo in metal command encoder ( #2471 )
2025-08-06 16:58:23 -07:00
Luca Vivona
728d4db582
Support destination arg in tree flatten/unflatten ( #2450 )
2025-08-06 15:34:59 -07:00
Awni Hannun
db5c7efcf6
revert default cuda install ( #2465 )
...
* revert default cuda install
* revert default cuda install
2025-08-06 06:19:12 -07:00
Awni Hannun
7bb96e4249
fix cublas on h100 ( #2466 )
2025-08-06 06:18:58 -07:00
Awni Hannun
fa89f0b150
faster gather qmm sorted test ( #2463 )
2025-08-05 06:27:40 -07:00
Awni Hannun
ca973d1e83
fix install tags ( #2464 )
2025-08-04 20:01:23 -07:00
Cheng
828c5f1137
Use SmallVector for shapes and strides ( #2454 )
...
* Use SmallVector for shapes and strides
* Convert SmallVector to tuple
2025-08-05 09:41:03 +09:00
Gaétan Lepage
7d86a5c108
Feat: add USE_SYSTEM_FMT CMake option ( #2219 )
2025-08-04 16:36:11 -07:00
Awni Hannun
0b807893a7
fix wraps compile ( #2461 )
2025-08-04 16:14:18 -07:00
Awni Hannun
6ad0889c8a
default install cuda on linux ( #2462 )
2025-08-04 15:33:05 -07:00
Zamderax
737dd6d1ac
Add missing <algorithm> header to jit_compiler.cpp ( #2460 )
...
Fixes compilation error on Linux where std::find_if is used on line 121
but the <algorithm> header was not included. While this might work on
some platforms due to transitive includes, it's not guaranteed by the
C++ standard.
Resolves issue #2459
2025-08-04 14:00:46 -07:00
Cheng
aaf78f4c6b
Use LRU cache for cuda graph ( #2448 )
...
* Use LRU cache for cuda graph
* Remove unused destructor
2025-08-02 21:28:57 +09:00
Angelos Katharopoulos
8831064493
Fix arctan2 grads ( #2453 )
2025-08-01 21:06:04 -07:00
Angelos Katharopoulos
be9bc96da4
[CUDA] Matmul utils initial commit ( #2441 )
2025-08-01 14:22:25 -07:00
Angelos Katharopoulos
86258f292f
[CUDA] Vectorize generated kernels ( #2444 )
2025-07-31 18:18:57 -07:00
Cheng
b26d88591c
[CUDA] Save primitive inputs faster ( #2449 )
...
* Add more nvtx loggings
* [CUDA] Saving primitive inputs faster
* Remove unneeded check
2025-08-01 10:16:06 +09:00
Cheng
86c6a15571
[CUDA] Backward convolution ( #2431 )
2025-08-01 09:54:05 +09:00
junpeiz
8b25ce62d5
Add tests for export including control flow models and quantized models ( #2430 )
...
* Add tests for export, including control flow export and quantized model export.
* Skip quantization related test for CUDA backend.
2025-07-31 11:06:26 -07:00
Awni Hannun
da5912e4f2
fix custom metal extension ( #2446 )
2025-07-31 06:25:36 -07:00
Cheng
daafee676f
Fix wrong graph key when using concurrent context ( #2447 )
2025-07-31 06:01:05 -07:00
Awni Hannun
d32519c8ee
fix gemv regression ( #2445 )
2025-07-30 14:23:01 -07:00
Awni Hannun
b405591249
fix circular reference ( #2443 )
2025-07-30 09:37:44 -07:00
Angelos Katharopoulos
3bf81ed1bd
[CUDA] Quantized refactoring ( #2442 )
2025-07-30 08:27:20 -07:00
Cheng
2204182bba
Make CI faster ( #2440 )
2025-07-30 02:26:36 -07:00
Cheng
3628e5d497
Use load_vector in arg_reduce ( #2439 )
2025-07-30 17:40:26 +09:00
Cheng
a0ae49d397
Move arange to its own file ( #2438 )
2025-07-30 13:05:51 +09:00
Cheng
254476718b
Remove the kernel arg from get_launch_args ( #2437 )
2025-07-30 11:43:02 +09:00
Awni Hannun
3adba92ebe
Cuda faster softmax ( #2435 )
...
* faster softmax and logsumexp
* faster softmax and logsumexp
* format
2025-07-29 17:18:12 -07:00
Awni Hannun
ef631d63af
faster rms norm ( #2433 )
2025-07-29 13:12:00 -07:00
Cheng
970dbe8e25
Use ccache in CI ( #2414 )
...
* Detect ccache
* Use ccache in CI
* Separate cache for different images
* Test both 12.2 and 12.9 for PRs
2025-07-29 08:43:22 +09:00
Awni Hannun
641be9463b
Add more CUDA architectures for PyPi package ( #2427 )
...
* add cuda sm 90
* add more archs
2025-07-28 12:35:15 -07:00
Awni Hannun
ab0e608862
[CUDA] More sizes for gemv ( #2429 )
...
* route more to gemv
* route more sizes to custom gemv
2025-07-28 12:35:01 -07:00
Awni Hannun
1588659062
no occupancy query for launch params ( #2426 )
2025-07-28 09:09:41 -07:00
Awni Hannun
b9e88fb976
[CUDA] Fix segfault on exit ( #2424 )
...
* fix cuda segfault on exit
* comment
2025-07-27 08:08:13 -07:00
Awni Hannun
4ad53414dd
fix cuda pypi package ( #2423 )
...
* fix cuda pypi package
* patch bump
2025-07-25 15:20:29 -07:00
Awni Hannun
d1165b215e
version ( #2420 )
2025-07-25 13:29:28 -07:00
Awni Hannun
dcb8319f3d
update install docs and requirements ( #2419 )
2025-07-25 12:13:19 -07:00
Awni Hannun
5597fa089c
Fix qvm splitk ( #2415 )
2025-07-25 11:50:24 -07:00
Awni Hannun
9acec364c2
[CUDA] Always use batched matmul ( #2404 )
...
* cuda batched mm
* addmm as well
* comment
2025-07-24 20:46:02 -07:00
Skonor
7d9d6ef456
docs: fix adam and adamw eps placement ( #2416 )
...
Co-authored-by: Mikhail Gorbunov <m_gorbunov@apple.com >
2025-07-24 16:40:45 -07:00
Cheng
6f5874a2f2
[CUDA] Initial implementation of Convolution with cuDNN ( #2385 )
...
* Link with cuDNN
* Initial implementation
* Remove backend apis
* Fix recording cudnn conv
* More unused backend apis
* Fix C++ conv tests
* include cudnn as python dep
* Install libcudnn9-dev-cuda-12 in CI
* cudnn only accepts contiguous inputs
* Switch to backend apis
* Plan needs to be kept alive
* Turn off tf32
* Add cache
* Test the native cuda graph api
* Set cudnn stream before execution
* Make LRUCache more like a normal container
* Do error check for cublas handle
* Zero-initilizing array
* Use tf32 for conv
* Skip TestConv.test_torch_conv_2D test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-07-25 08:12:10 +09:00
Awni Hannun
70dc336785
Test on cuda 12.2 and 12.9 ( #2413 )
2025-07-24 06:06:15 -07:00
Awni Hannun
4e504039f5
[Metal] Release metal events ( #2412 )
...
* release metal events
* fix
* fix
2025-07-23 19:53:42 -07:00
Awni Hannun
d1f4d291e8
Fix uv install and add dev release ( #2411 )
...
* fix uv install and add dev release
* fix docstring
* pin cuda deps
* cuda release on cpu-only machine
2025-07-23 16:54:19 -07:00
Awni Hannun
e1840853ce
full row mask in sdpa consistently gives nan ( #2406 )
2025-07-23 16:37:03 -07:00
Cheng
0f5ce173da
[CUDA] --compress-mode requires CUDA 12.8 ( #2407 )
2025-07-23 06:11:11 -07:00
Cheng
588854195f
Remove unused code in Convolution::vjp ( #2408 )
2025-07-23 06:11:00 -07:00
Fangjun Kuang
28d068bce6
Fix an error in the comment for mx.dequantize ( #2409 )
2025-07-23 06:10:50 -07:00
Awni Hannun
d107d8d495
add cuda gemv ( #2400 )
2025-07-22 08:24:13 -07:00
Awni Hannun
1e496ddb82
[CUDA] Simplify allocator ( #2392 )
...
* simplify allocator and fixe race with small pool
* Don't use shared event in worker
* use cuda buffer in small pool
* comment
* comment
2025-07-22 08:24:01 -07:00
Awni Hannun
74eccbf3fa
use size option in binary ( #2399 )
2025-07-22 07:00:53 -07:00
Awni Hannun
08638223ca
Fix including stubs in wheel ( #2398 )
...
* fix including stubs in wheel
* fix bool_
2025-07-22 06:30:17 -07:00
Cheng
56cc858af9
Add contiguous_copy_cpu util for copying array ( #2397 )
2025-07-21 07:30:35 -07:00
Cheng
f55c4ed1d6
Remove thrust iterators ( #2396 )
2025-07-21 07:30:27 -07:00
Awni Hannun
93d70419e7
[CUDA] speedup handling scalars ( #2389 )
...
* speedup scalars in cuda
* comment
2025-07-18 21:47:31 -07:00
Awni Hannun
63f663d9c6
fix cuda manylinux version to match others ( #2388 )
2025-07-18 21:02:16 -07:00
Awni Hannun
84b4d96efa
fix release build + patch bump ( #2387 )
2025-07-18 14:47:37 -07:00
Awni Hannun
aec67f2fa6
patch bump ( #2386 )
2025-07-18 12:25:48 -07:00
Gökdeniz Gülmez
deee214a95
Adding support for the Muon Optimizer ( #1914 )
...
* initial commit with workong optmimizer
* update ACKNOWLEDGMENTS.md
* nits and adding it to test
* nits
* G.astype(mx.bfloat16) to G.astype(G.dtype)
* G.ndim >= 2 to assert G.ndim == 2
* remove coments
* replace with mx.addmm
* remove comments
* format
* nits
* match muon
* fix addmm
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-07-18 12:25:28 -07:00
Cheng
45adec102c
Add contiguous_copy_gpu util for copying array ( #2379 )
2025-07-18 06:44:25 -07:00
Cheng
31fc530c76
[CUDA] Add more ways finding CCCL headers in JIT ( #2382 )
2025-07-17 15:25:34 -07:00
Awni Hannun
fbb3f65a1a
fix resource leaks in matmul and graph ( #2383 )
2025-07-17 06:50:15 -07:00
Angelos Katharopoulos
6b1b8ea91b
[CUDA] Add work per thread to compile ( #2368 )
2025-07-17 06:47:52 -07:00
Awni Hannun
b2273733ea
Test with CUDA 12.2 ( #2375 )
...
* Test with CUDA 12.0
* try older image
* fix cpu sort
2025-07-16 13:00:37 -07:00
Awni Hannun
f409b229a4
fix ring distributed test ( #2380 )
2025-07-16 11:25:24 -07:00
Cheng
30571e2326
Rename the copy util in cpu/copy.h to copy_cpu ( #2378 )
2025-07-16 07:34:24 -07:00
Awni Hannun
d7734edd9f
fix complex reduce + nan propagation in min and max ( #2377 )
2025-07-15 18:19:47 -07:00
Awni Hannun
2ba69bc8fa
lower memory uniform sampling ( #2361 )
...
* lower memory uniform
* use fp32
* fix
2025-07-15 14:22:07 -07:00
Cheng
cb349a291c
[CUDA] Use cuda::std::complex in place of cuComplex ( #2372 )
2025-07-15 00:36:13 -07:00
Awni Hannun
f0a0b077a0
Install linux with mlx[cuda] and mlx[cpu] ( #2356 )
...
* install linux with mlx[cuda] and mlx[cpu]
* temp for testing
* cleanup circle, fix cuda repair
* update circle
* update circle
* decouple python bindings from core libraries
2025-07-14 17:17:33 -07:00
Awni Hannun
49114f28ab
fix flaky test ( #2371 )
2025-07-14 17:16:18 -07:00
Awni Hannun
e7d2ebadd2
[CUDA] Affine quantize ( #2354 )
...
* affine quantize and dequantize kernels
* format
* fix
* format
2025-07-14 15:45:44 -07:00
Awni Hannun
e569803d7c
update linux build ( #2370 )
2025-07-14 15:13:56 -07:00
Cheng
d34f887abc
Add Primitive::name and remove Primitive::print ( #2365 )
2025-07-14 14:06:35 -07:00
Angelos Katharopoulos
5201df5030
Fix imag() vjp ( #2367 )
2025-07-14 13:11:16 -07:00
Cheng
2d3c26c565
[CUDA] Do not put kernels in annoymous namespace ( #2362 )
2025-07-12 14:24:45 -07:00
Cheng
6325f60d52
[CUDA] Bundle CCCL for JIT compilation ( #2357 )
...
* Ship CCCL for JIT compilation
* Remove cexpf
2025-07-11 18:45:37 -07:00
Awni Hannun
42cc9cfbc7
fix copy dispatch ( #2360 )
2025-07-11 10:59:35 -07:00
Cheng
8347575ba1
[CUDA] Implement Scan kernel ( #2347 )
...
* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal
2025-07-10 16:54:12 -07:00
Angelos Katharopoulos
b6eec20260
Fix edge check in qmm_n QuantizedLoader ( #2355 )
2025-07-10 16:28:50 -07:00
Angelos Katharopoulos
0eb035b4b1
Fix type promotion in Adam with bias correction ( #2350 )
2025-07-10 11:14:42 -07:00
Cheng
afb9817599
[CUDA] Put version in ptx cache dir path ( #2352 )
2025-07-10 07:24:21 -07:00
Cheng
8fb3e7a26c
[CUDA] Set current device before cudaGraphLaunch ( #2351 )
2025-07-10 07:24:02 -07:00
jhavukainen
8c7bc30ce4
Align mlx::core::min op nan propagation with NumPy ( #2346 )
2025-07-10 06:20:43 -07:00
Cheng
85873cb162
[CUDA] Do vectorized store/load in contiguous elementwise ops ( #2342 )
...
* Do vectorized store/load in unary ops
* Do vectorized store/load in binary_two ops
* Do vectorized store/load in copy ops
* Do vectorized store/load in ternary ops
* Use int32_t for IdxT
* binary => binary_two in binary_two.cu
* Fix tests on large arrays
* Use uint as index type
* Contig uses uint as index and non-contig uses int
2025-07-09 18:48:43 -07:00
Awni Hannun
e14ee12491
add zero for argsort vjp ( #2345 )
2025-07-09 14:37:14 -07:00
jhavukainen
8b9a3f3cea
Align mlx::core::max op nan propagation with NumPy ( #2339 )
...
* Make max op NaN propagation rules align with numpy
* Adding benchmarks and testing for max op nanpropagation
* Pre-commit formatting
* Fix max complex64 nan propagation and add test
* Improve the cpp unittest
* Only check nans on non-integral types in simd_reduce_impl.
* Cleanup using namespace alias
* Add cpu Max nanpropagation. Fix a small fib in cpu max dispatch data types for int8/int16.
* Make the max nanpropagation test more meaningful for integer types
* Remove tuple unpacking syntax to comply with earlier python versions. Add cuda skip to nanpropagation tests, fix cuda implementation in a separate PR.
2025-07-09 11:26:27 -07:00
Awni Hannun
fb4e8b896b
patch bump ( #2343 )
2025-07-08 14:26:07 -07:00
Cheng
2ca533b279
Fix compilation with CUDA 11 ( #2331 )
2025-07-07 20:00:43 -07:00
Angelos Katharopoulos
4a9b29a875
MoE backward improvements ( #2335 )
2025-07-07 17:59:53 -07:00
Awni Hannun
a4fcc893cd
auto build linux release ( #2341 )
2025-07-07 09:29:23 -07:00
Cheng
9d10239af7
[CUDA] Do vectorized store/load in binary ops ( #2330 )
2025-07-07 08:44:14 -07:00
Cheng
19facd4b20
Build with all cpu cores by default ( #2336 )
2025-07-07 06:06:45 -07:00
Angelos Katharopoulos
f5299f72cd
Fix layernorm race condition ( #2340 )
2025-07-07 06:06:01 -07:00
Cheng
0e0d9ac522
[CUDA] Add MLX_CUDA_GRAPH_CACHE_SIZE env for setting graph cache size ( #2329 )
2025-07-05 08:33:29 -07:00
Awni Hannun
8917022deb
fix graphs for older cuda ( #2328 )
2025-07-02 19:37:58 -07:00
Awni Hannun
ec0d5db67b
[CUDA] Switch to CUDA graphs ( #2317 )
...
* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment
2025-07-02 15:59:13 -07:00
Cheng
e76e9b87f0
Fix compilation error from integral_constant ( #2326 )
2025-07-02 06:04:38 -07:00
Awni Hannun
cfb6a244ea
allow parameters to be deleted ( #2325 )
2025-07-01 21:27:23 -07:00
Awni Hannun
58f3860306
patch bump ( #2324 )
2025-07-01 12:12:16 -07:00
Awni Hannun
dd4f53db63
use fp32 for testing, add more complex ops ( #2322 )
2025-07-01 07:30:00 -07:00
Angelos Katharopoulos
3d5e17e507
MLX_SWITCH macros to templates ( #2320 )
2025-07-01 01:33:44 -07:00
Awni Hannun
33bf1a244b
Fix module update in strict mode ( #2321 )
...
* fix module update in strict mode
* allow GELU to be pickled
2025-06-29 11:12:29 -07:00
Angelos Katharopoulos
772f471ff2
[CUDA] Fix reductions ( #2314 )
2025-06-27 12:59:20 -07:00
Angelos Katharopoulos
2c11d10f8d
Split broadcast so it is always fused in compile ( #2318 )
2025-06-26 22:08:18 -07:00
Angelos Katharopoulos
656ed7f780
Fix get 2d grid dims ( #2316 )
2025-06-25 13:03:09 -07:00
Awni Hannun
81bb9a2a9e
Compile float64 functions on CPU ( #2311 )
2025-06-24 10:18:52 -07:00
Angelos Katharopoulos
5adf185f86
Fix update_modules() when providing a subset ( #2308 )
2025-06-20 17:19:46 -07:00
Awni Hannun
c9a9180584
Cuda perf tuning ( #2307 )
...
* perf tuning
* fix adding inputs arrays in matmul / srot
* format
* fix
2025-06-20 14:50:57 -07:00
Awni Hannun
76831ed83d
Build CUDA release in Circle ( #2306 )
...
* cuda release
* add license
2025-06-19 15:26:36 -07:00
Angelos Katharopoulos
b3d7b85376
Make ptx cache settable by environment variable ( #2304 )
2025-06-17 23:55:56 -07:00
Awni Hannun
cad5c0241c
[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )
...
* cuda synch properly waits for all tasks to finish and clear
* fix copy
2025-06-17 12:03:25 -07:00
Awni Hannun
b8022c578a
divmod, partition, sort fixes ( #2302 )
2025-06-16 18:49:32 -07:00
Awni Hannun
bc53f8293f
Cuda bug fixes 2 ( #2298 )
...
* more bug fixes
* more bug fixes
* format
2025-06-16 13:14:46 -07:00
Awni Hannun
c552ff2451
[CUDA] Fix back-end bugs and enable corresponding tests ( #2296 )
...
* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format
2025-06-16 08:45:40 -07:00
Awni Hannun
4fda5fbdf9
add python testing for cuda with ability to skip list of tests ( #2295 )
2025-06-15 10:56:48 -07:00
Angelos Katharopoulos
580776559b
RoPE for CUDA ( #2293 )
...
* First working CUDA rope
* Fix random
2025-06-15 06:08:07 -07:00
Awni Hannun
a14aaa7c9d
Fix cuda arg reduce ( #2291 )
2025-06-14 17:54:00 -07:00
Awni Hannun
a6d780154f
fix cuda gemm for bf16 ( #2288 )
2025-06-13 22:10:46 -07:00
Awni Hannun
6871e2eeb7
fix cuda jit ( #2287 )
2025-06-13 19:21:46 -07:00
Awni Hannun
8402a2acf4
Fix complex power and print ( #2286 )
...
* fix complex power and print
* fix complex matmul shape
2025-06-13 11:13:00 -07:00
Jagrit Digani
fddb6933e1
Collection of refactors ( #2274 )
...
* Refactor gemv into a function
* Refactor splitk step 1
* Refactor split k axpby
* Rearrange steel_gemm_regular
* Redirect steel_gemm_regular
* Add axpby routing to steel_matmul_regular
* Refactor AddMM step 1
* Redirect steel_gemm
* Update addmm
* Comments and format
* Some cleanup
* Add architecture gen to device
* Update no copy condition in normalization to account for axis size 1
2025-06-13 10:44:56 -07:00
Cheng
c8b4787e4e
CUDA backend: indexing ops ( #2277 )
2025-06-12 21:44:19 -07:00
Awni Hannun
2188199ff8
[CUDA] ternary with select op ( #2283 )
...
* cuda ternary with select op
* comment + fix
* fix
2025-06-12 20:24:43 -07:00
Awni Hannun
aa07429bad
Fix cuda build ( #2284 )
2025-06-12 17:48:05 -07:00
Awni Hannun
918761a25a
[CUDA] RMSNorm and VJP ( #2280 )
...
* rms norm start
* nit
2025-06-12 17:09:49 -07:00
Cheng
a4fc671d3e
CUDA backend: compile ( #2276 )
...
* CUDA backend: compile
* Rename kernels/ to device/
2025-06-12 17:08:39 -07:00
Awni Hannun
f5f65ef48c
Make sliceUpdate general ( #2282 )
...
* Make sliceUpdate general
* fix
2025-06-12 16:48:54 -07:00
Cheng
c2dd81a8aa
Fix warnings from latest CUDA toolkit ( #2275 )
2025-06-12 06:03:01 -07:00
Cheng
d7e680ffe4
CUDA backend: layernorm ( #2271 )
2025-06-11 15:48:32 -07:00
Cheng
c371baf53a
CUDA backend: softmax ( #2272 )
2025-06-11 13:55:22 -07:00
Cheng
ccf78f566c
CUDA backend: argreduce ( #2270 )
2025-06-11 13:26:17 -07:00
Cheng
c9fa68664a
CUDA backend: reduce ( #2269 )
2025-06-11 11:22:25 -07:00
Awni Hannun
c35f4d089a
start cuda circle config ( #2256 )
...
* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config
2025-06-10 21:19:47 -07:00
Angelos Katharopoulos
8590c0941e
Add load_safe to the general conv loaders ( #2258 )
2025-06-10 20:58:16 -07:00
Cheng
095163b8d1
Fix building cpp benchmarks on Linux ( #2268 )
2025-06-10 17:10:24 -07:00
Cheng
99c33d011d
rebase + nit ( #2260 )
...
Co-authored-by: Awni Hannun <awni@apple.com >
2025-06-10 10:51:51 -07:00
Awni Hannun
62fecf3e13
fix conv export ( #2265 )
2025-06-10 09:34:01 -07:00
Cheng
7c4eb5d03e
CUDA backend: random ( #2261 )
2025-06-10 08:59:56 -07:00
Cheng
bae9a6b404
CUDA backend: sort ( #2262 )
...
Co-authored-by: Awni Hannun <awni@apple.com >
2025-06-10 08:59:47 -07:00
Christopher Fleetwood
004c1d8ef2
Report number of missing parameters ( #2264 )
...
* chore: inform
* chore: format
---------
Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com >
2025-06-10 06:37:50 -07:00
Cheng
7ebb2e0193
CUDA backend: binary ops ( #2259 )
2025-06-10 06:37:40 -07:00
Awni Hannun
9ce77798b1
fix export to work with gather/scatter axis ( #2263 )
2025-06-09 20:37:27 -07:00
Cheng
f8bad60609
CUDA backend: unary ops ( #2158 )
2025-06-09 06:45:08 -07:00
Emmanuel Ferdman
5866b3857b
Refactor the lu test ( #2250 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
2025-06-07 06:12:08 -07:00
Awni Hannun
1ca616844b
Fix unintuitive metal kernel caching ( #2242 )
...
* Fix unintuitive metal kernel caching
* alternative solution
2025-06-06 20:08:15 -07:00
Angelos Katharopoulos
2e8cf0b450
Change layernorms to two pass algorithm ( #2246 )
2025-06-06 13:34:56 -07:00
Cheng
24f89173d1
CUDA backend: matmul ( #2241 )
2025-06-06 12:24:04 -07:00
Awni Hannun
c6a20b427a
Improve metal elementwise kernels ( #2247 )
...
* improve metal elementwise kernels
* compile and copy
* fix jit
2025-06-06 11:37:40 -07:00
Awni Hannun
a5ac9244c4
fix linux linking error ( #2248 )
2025-06-06 10:41:51 -07:00
Awni Hannun
c763fe1be0
default strict mode for module update and update_modules ( #2239 )
2025-06-05 15:27:02 -07:00
Cheng
52dc8c8cd5
Add profiler annotations in common primitives for CUDA backend ( #2244 )
2025-06-04 19:55:12 -07:00
Angelos Katharopoulos
aede70e81d
Perf regression fix ( #2243 )
2025-06-03 17:55:12 -07:00
Cheng
85a8beb5e4
Avoid atomic updates across CPU/GPU in CUDA event ( #2231 )
2025-06-03 16:49:06 -07:00
Cheng
0bb89e9e5f
Share more common code in Compiled ( #2240 )
...
* Share more common code in Compiled
* Remove build_lib_name
2025-06-03 16:48:50 -07:00
Cheng
5685ceb3c7
Avoid invoking allocator::malloc when creating CUDA event ( #2232 )
2025-06-03 16:48:40 -07:00
Suryash Malviya
0408ba0a76
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm ( #2220 )
...
* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-06-02 15:58:46 -07:00
Awni Hannun
cbad6c3093
version ( #2237 )
2025-06-02 15:58:33 -07:00
Cheng
1b021f6984
Fast primitives decide when to use the fallback ( #2216 )
2025-06-02 13:26:37 -07:00
Cheng
95b7551d65
Do not check event.is_signaled() in eval_impl ( #2230 )
2025-06-02 13:23:34 -07:00
Cheng
db5a7c6192
Add memory cache to CUDA backend ( #2221 )
...
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
2025-05-30 12:12:54 -07:00
Awni Hannun
6ef2f67e7f
5bit quants ( #2226 )
...
* 5bit quants
* 5bit quants
2025-05-30 12:12:10 -07:00
Cheng
f76ee1ffd2
Move some dims utils to common ( #2223 )
2025-05-29 06:48:30 -07:00
Cheng
54a71f270a
Remove unused defines ( #2217 )
2025-05-23 06:14:58 -07:00
Awni Hannun
55b4062dd8
copyright in docs ( #2214 )
2025-05-21 17:13:04 -07:00
Cheng
79071bfba4
Fix out-of-bounds default value in logsumexp/softmax ( #2213 )
2025-05-21 07:25:16 -07:00
Cheng
7774b87cbd
Remove redundant simd_sum in logsumexp ( #2210 )
2025-05-21 07:25:03 -07:00
Cheng
35c87741cf
Build for compute capability 70 instead of 75 ( #2209 )
2025-05-20 19:42:48 -07:00
Jack Wind
4cbe605214
Feat: Allow per-target Metal debug flags ( #2201 )
...
* feat: allow per-target Metal debug flags
* formatting fix
2025-05-20 10:22:26 -07:00
Clement Liaw
ab8883dd55
include mlx::core::version() symbols in the mlx static library ( #2207 )
2025-05-20 07:39:11 -07:00
Awni Hannun
eebe73001a
fix large arg reduce ( #2206 )
2025-05-19 13:10:44 -07:00
Angelos Katharopoulos
0359bf02c9
Nearest upsample ( #2202 )
2025-05-19 11:23:38 -07:00
Cheng
237f9e58a8
Fix BEFORE keyword in target_include_directories ( #2204 )
2025-05-19 06:10:44 -07:00
Awni Hannun
8576e6fe36
fix conv2d bug + faster conv 1d ( #2195 )
...
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
2025-05-18 06:05:11 -07:00
Angelos Katharopoulos
0654543dcc
Add complex eigh ( #2191 )
2025-05-18 00:18:43 -07:00
Awni Hannun
48ef3e74e2
reduce vjp for all and any ( #2193 )
2025-05-16 08:38:49 -07:00
Cheng
7d4b378952
Include cuda_bf16.h for bfloat16 overloads ( #2192 )
...
* Include cuda_bf16.h for bfloat16 overloads
* Add NO_GPU_MULTI(Eig) in cuda backend
2025-05-16 06:44:42 -07:00
Jack Wind
7ff5c41e06
Add set_threadgroup_memory_length to CommandEncoder ( #2183 )
2025-05-16 00:28:03 -07:00
Awni Hannun
602f43e3d1
fix conv grad ( #2187 )
2025-05-15 19:20:36 -07:00
Awni Hannun
a2cadb8218
real and imag properties ( #2189 )
2025-05-15 18:17:50 -07:00
Awni Hannun
c1eb9d05d9
non-symmetric eig and eigh ( #2188 )
2025-05-15 13:01:44 -07:00
Angelos Katharopoulos
cf6c939e86
Fix some complex vjps ( #2178 )
2025-05-14 23:37:12 -07:00
Angelos Katharopoulos
130df35e1b
Add random normal distribution for complex numbers ( #2182 )
2025-05-13 22:43:45 -07:00
Cheng
0751263dec
Fix typo in row_reduce_small ( #2179 )
2025-05-13 20:19:54 -07:00
Cheng
eca2f3eb97
Add remove_index utility ( #2173 )
2025-05-13 17:09:56 -07:00
Angelos Katharopoulos
3aa9cf3f9e
Fix put_along_axis for empty arrays ( #2181 )
2025-05-13 14:27:53 -07:00
Awni Hannun
8f3d208dce
Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )
...
* handle hadamard and addmm on empty inputs
* fix
2025-05-12 10:48:57 -07:00
Ivan Fioravanti
caaa3f1f8c
Small typos in mx.metal deprecations ( #2176 )
2025-05-11 06:03:47 -07:00
Awni Hannun
659a51919f
patch bump ( #2162 )
2025-05-09 14:35:14 -07:00
Awni Hannun
6661387066
Fix fft for integer overflow ( #2161 )
2025-05-09 14:25:12 -07:00
ATurker
a7fae8a176
fix: conv_general differences between gpu, cpu ( #2070 )
...
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-05-09 10:26:52 -07:00
Cheng
0cae0bdac8
CUDA backend: backbone ( #2075 )
2025-05-06 21:26:46 -07:00
Awni Hannun
5a1a5d5ed1
fix input coherent kernel launch ( #2153 )
2025-05-05 17:30:50 -07:00
Cheng
1683975acf
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
Awni Hannun
af705590ac
fix batched vector sdpa ( #2152 )
2025-05-05 13:13:03 -07:00
Awni Hannun
825124af8f
fix bw for elementwise ops ( #2151 )
...
* fix bw for elementwise ops
* add compile
* fix
* fix
* fix
* fix
2025-05-05 06:15:04 -07:00
Awni Hannun
9c5e7da507
fix compile merging ( #2150 )
2025-05-02 15:08:50 -07:00
Angelos Katharopoulos
481349495b
GPU Hadamard for large N ( #1879 )
2025-05-01 17:19:17 -07:00
Awni Hannun
9daa6b003f
fix shapeless export ( #2148 )
2025-05-01 15:02:02 -07:00
Angelos Katharopoulos
a3a632d567
Fix the launcher when ran locally ( #2147 )
2025-05-01 12:56:09 -07:00
Awni Hannun
e496c5a4b4
fix integer overflow in qmm ( #2143 )
2025-04-30 09:28:56 -07:00
Cheng
ea890d8710
Remove metal-only tests ( #2139 )
2025-04-30 09:08:39 -07:00
Awni Hannun
aa5d84f102
Allow quant layer to be unfrozen ( #2142 )
2025-04-30 09:08:29 -07:00
Awni Hannun
f1606486d2
Generalize gpu backend ( #2138 )
...
* generalize gpu backend
* fix no_gpu build
* fix no_gpu build
* generalize gpu backend
2025-04-30 09:08:17 -07:00
Cheng
87720a8908
Fix building with uv ( #2141 )
2025-04-30 06:04:07 -07:00
Aashiq Dheeraj
bb6565ef14
add fftshift and ifftshift fft helpers ( #2135 )
...
* add fftshift and ifftshift fft helpers
* address comments
* axes have to be iterable
* fix fp error in roll + add test
---------
Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local >
2025-04-29 22:13:45 -07:00
Awni Hannun
7bb063bcb3
Enable vjp for quantized scale and bias ( #2129 )
...
* Enable vjp for quantized scale and bias
* higher tol
2025-04-29 13:03:09 -07:00
Alex Chi Z.
b36dd472bb
return library if it is successfully loaded ( #2131 )
2025-04-29 07:30:36 -07:00
hdeng-apple
167b759a38
Fix typos ( #2136 )
2025-04-29 07:26:05 -07:00
charan-003
99b9868859
Clarify dimension notation in conv1d, conv2d, and conv3d docstrings ( #2123 )
...
* Clarify dimension notation in conv1d, conv2d, and conv3d docstrings
* Updating transposed convs in conv1d, conv2d, and conv3d
---------
Co-authored-by: Sai Charan Arvapally <saicharan@Sais-MacBook-Pro.local >
2025-04-25 12:18:30 -07:00
1ndig0
6b2d5448f2
Fix the error message in mx.right_shift and mx.left_shift ( #2121 )
...
* update right_shift and lef_shift
* simplify
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-04-25 09:14:28 -07:00
Awni Hannun
eaf709b83e
patch ( #2119 )
2025-04-24 16:11:07 -07:00
Angelos Katharopoulos
f0e70afff0
Fix swift pm load ( #2117 )
2025-04-24 10:58:29 -07:00
hdeng-apple
86984cad68
Remove static initializers ( #2059 )
...
* Remove static initializers in device.cpp, load.cpp, pocketfft.h
* Remove static initializer InTracing::trace_stack
* Remove static initializer of CompilerCache cache
* Revert changes in pocketfft.h
* Remove duplicate private section of thread_pool()
2025-04-24 06:14:49 -07:00
Awni Hannun
fbc89e3ced
fix pinv ( #2110 )
2025-04-23 13:08:28 -07:00
hdeng-apple
38c1e720c2
Search mlx.metallib in macOS framework "Resources" dir ( #2061 )
...
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-04-23 09:53:13 -07:00
Param Thakkar
600e87e03c
Added output_padding parameters in conv_transpose ( #2092 )
2025-04-23 09:26:33 -07:00
Hyunsung Lee
3836445241
Add broadcast_shapes in python API ( #2091 )
2025-04-22 18:57:39 -07:00
Yury Popov
1d2c9d6a07
Complex scan ( #2094 )
2025-04-22 18:56:28 -07:00
Awni Hannun
e8ac6bd2f5
irfft throws instead of segfaults on scalars ( #2109 )
2025-04-22 10:25:55 -07:00
Awni Hannun
fdadc4f22c
Add more complex unary ops ( #2101 )
2025-04-21 13:04:54 -07:00
Awni Hannun
79b527f45f
conv vmap ( #2102 )
2025-04-21 13:04:39 -07:00
Awni Hannun
dc4eada7f0
Use unordered map for kwargs in export/import ( #2087 )
...
* use unordered map for kwargs in export/import
* comment
2025-04-21 07:17:22 -07:00
Cheng
70ebc3b598
Return const ref in array::data_shared_ptr ( #2100 )
2025-04-21 07:17:09 -07:00
Cheng
b13f2aed16
Introduce macros for dispatching dynamic dtypes as static types ( #2073 )
2025-04-19 06:16:30 -07:00
Param Thakkar
5f04c0f818
Fixed shift operations issue ( #2080 )
...
* Fixed shift operations issue
* Added tests and fixes
* Fixed loop syntax error
* Added tests for bool
* Fixed typo
2025-04-18 14:28:33 -07:00
Awni Hannun
55935ccae7
fix py gc edge case ( #2079 )
2025-04-18 12:46:53 -07:00
Awni Hannun
b529515eb1
minor bump ( #2081 )
2025-04-17 14:57:11 -07:00
Angelos Katharopoulos
3cde719eb7
Route to gather qmm only for many tokens per expert ( #2082 )
2025-04-17 14:53:08 -07:00
Angelos Katharopoulos
5de6d94a90
Gather qmm batched kernel and refactoring of quantized ( #2078 )
2025-04-17 13:53:11 -07:00
Angelos Katharopoulos
99eefd2ec0
Gather mm new kernel and small refactoring ( #2040 )
2025-04-14 16:37:36 -07:00
Yury Popov
e9e268336b
LogCumSumExp ( #2069 )
2025-04-13 01:27:29 -07:00
Awni Hannun
7275ac7523
Fix release build ( #2072 )
2025-04-12 20:41:58 -07:00
Angelos Katharopoulos
c4189a38e4
Add float mask to sdpa vector ( #2068 )
2025-04-11 17:29:40 -07:00
Awni Hannun
68d1b3256b
nit: fix exception handling ( #2066 )
2025-04-11 14:12:08 -07:00
Awni Hannun
9c6953bda7
Fix stubgen ( #2065 )
...
* Fix stubgen
* add multi optim to docs
2025-04-11 12:02:54 -07:00
Awni Hannun
ef7ece9851
fix fft bug ( #2062 )
2025-04-10 19:41:27 -07:00
Angelos Katharopoulos
ddaa4b7dcb
Fix the test and add custom min/max reductions for uncommon MPI types ( #2060 )
2025-04-10 17:01:17 -07:00
Cheng
dfae2c6989
Fix MSVC build due to use of M_LN2 ( #2058 )
2025-04-10 07:41:41 -07:00
Anastasiia Filippova
515f104926
Min / max reductions ( #2041 )
2025-04-09 23:22:20 -07:00
Angelos Katharopoulos
9ecefd56db
Do not load the default lib if another is requested ( #2055 )
2025-04-09 13:31:38 -07:00
Awni Hannun
e5d35aa187
no sdpa in grad ( #2054 )
2025-04-08 19:13:54 -07:00
Awni Hannun
00794c42bc
Fix causal mask sdpa vec ( #2053 )
...
* fix sdpa vector causal mask
* test
2025-04-08 09:11:23 -07:00
Cheng
08a1bf3f10
Remove Event::Signal() ( #2052 )
2025-04-08 06:20:27 -07:00
Awni Hannun
60c4154346
Only request residency once ( #2051 )
2025-04-07 10:47:51 -07:00
Awni Hannun
f2c85308c1
add a half simd gemm fallback ( #2046 )
...
* add a half simd gemm fallback
* nit
2025-04-07 09:31:29 -07:00
Awni Hannun
1a28b69ee2
only add to residency set once ( #2049 )
2025-04-06 17:38:25 -07:00
Cheng
ba09f01ce8
Remove test of converting negative float to uint ( #2048 )
2025-04-06 06:21:46 -07:00
Cheng
6cf48872b7
wait_for_one should wait for task to finish ( #2047 )
2025-04-05 20:05:16 -07:00
Angelos Katharopoulos
7b3b8fa000
Fix ci release ( #2045 )
2025-04-04 20:25:01 -07:00
Awni Hannun
ec5e2aae61
nit in doc ( #2044 )
2025-04-04 12:04:17 -07:00
Awni Hannun
86389bf970
patch bump ( #2043 )
2025-04-03 13:15:18 -07:00
Jagrit Digani
3290bfa690
Add new sdpa function overload ( #2035 )
...
* Add new sdpa function overload
* Address comments
* Remove std::varaint from cpp sdpa function
2025-04-03 11:58:28 -07:00
Jagrit Digani
8777fd104f
Depthwise Conv2D optimization ( #2036 )
...
- Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions
- Add related tests
2025-04-03 09:42:04 -07:00
Awni Hannun
c41f7565ed
fix softmax / logsumexp ( #2042 )
2025-04-03 08:32:59 -07:00
Awni Hannun
9ba81e3da4
tune quant dispatch ( #2031 )
2025-04-02 20:05:54 -07:00
Awni Hannun
c23888acd7
Fix build warning ( #2033 )
2025-04-01 14:42:27 -07:00
Awni Hannun
f98ce25ab9
fix residency set for real ( #2032 )
2025-04-01 12:59:48 -07:00
Awni Hannun
de5f38fd48
Custom logsumexp ( #2028 )
...
* initial custom logsumexp
* more tests
* comments + fix
2025-03-31 07:36:55 -07:00
Angelos Katharopoulos
ec2854b13a
Swap -inf for finite_minimum value ( #2029 )
2025-03-30 21:55:04 -07:00
Stephen Panaro
90823d2938
Add missing funcs to docs ( #2021 )
2025-03-30 18:29:33 -07:00
Jesper Stemann Andersen
5f5770e3a2
Fix CPU sign for unsigned ints ( #2024 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-03-30 17:56:59 -07:00
Awni Hannun
28f39e9038
Log for complex numbers in Metal ( #2025 )
...
* Log for complex numbers in Metal
* fix log2
2025-03-30 17:04:38 -07:00
Awni Hannun
b2d2b37888
fix residency set clearing ( #2027 )
2025-03-30 16:27:26 -07:00
Awni Hannun
fe597e141c
add pinv to doc ( #2020 )
2025-03-30 15:54:18 -07:00
Yi Wang
72ca1539e0
Remove unused variable in /setup.py ( #2026 )
...
This is a follow up of https://github.com/ml-explore/mlx/pull/2011
2025-03-30 12:52:33 -07:00
Awni Hannun
13b26775f1
use minimum deployment target ( #2016 )
2025-03-28 14:31:53 -07:00
Awni Hannun
05d7118561
causal vector sdpa ( #2018 )
...
* causal vector sdpa
* get rid of memory threshold
2025-03-28 12:36:13 -07:00
Awni Hannun
98b901ad66
enable complex gemm ( #2017 )
2025-03-28 10:45:13 -07:00
Awni Hannun
5580b47291
iinfo and scalar overflow detection ( #2009 )
2025-03-27 19:54:56 -07:00
Awni Hannun
bc62932984
sdpa specialization for head dim 256 ( #2007 )
2025-03-27 19:31:25 -07:00
Awni Hannun
a6b5d6e759
revise cmake minimum for doctest ( #2014 )
2025-03-27 19:30:58 -07:00
Yi Wang
a8931306e1
Remove unused variable in CMakeBuild ( #2011 )
...
Fix https://github.com/ml-explore/mlx/issues/2010
2025-03-27 16:00:51 -07:00
Yi Wang
fecdb8717e
Polish CONTRIBUTING>md ( #2005 )
2025-03-25 19:06:34 -07:00
Awni Hannun
916fd273ea
wire cache ( #2006 )
2025-03-25 18:54:01 -07:00
Yi Wang
0da8506552
Update docs for extensions ( #2004 )
2025-03-25 18:35:03 -07:00
Cheng
eda7a7b43e
Do not join threads during process exit on Windows ( #1738 )
2025-03-25 06:33:08 -07:00
Chunyang Wen
022eabb734
Remove unused import ( #1987 )
2025-03-24 20:19:32 -07:00
Awni Hannun
aba899cef8
patch bump ( #2000 )
2025-03-24 12:47:05 -07:00
Jagrit Digani
6a40e1c176
Fix looping limit in causal attention ( #1999 )
2025-03-24 12:28:00 -07:00
Jesper Stemann Andersen
9307b2ab8b
Fixed 32-bit platform support for distributed/ring implementation ( #1996 )
...
Replaced unsigned long integer literals with size_t literals in ring implementation, e.g., 1UL with size_t(1).
2025-03-24 08:08:40 -07:00
Jesper Stemann Andersen
522d8d3917
Added missing netinet/in.h include that fixes build on FreeBSD ( #1997 )
...
Defines IPPROTO_TCP.
2025-03-24 08:07:34 -07:00
Awni Hannun
a84cc0123f
promote mask when needed ( #1998 )
2025-03-23 19:58:28 -07:00
Andrey Velichkevich
f018e248cd
fix(backend): Include algorithm library in Allocator ( #1992 )
...
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com >
2025-03-22 21:27:51 -07:00
Awni Hannun
cfd7237a80
fix docs ( #1991 )
2025-03-21 19:58:53 -07:00
Angelos Katharopoulos
4eef8102c9
Distributed layers ( #1270 )
2025-03-21 13:52:17 -07:00
Angelos Katharopoulos
69e4dd506b
Add a ring all gather ( #1985 )
2025-03-21 13:36:51 -07:00
Angelos Katharopoulos
25814a9458
Disable mpi on version mismatch ( #1989 )
2025-03-21 13:36:26 -07:00
Awni Hannun
2a980a76ce
Add stats and limit to common allocator and enable tests ( #1988 )
...
* add stats to common allocator and enable tests
* linux memory and default
* fix
2025-03-21 12:28:36 -07:00
Angelos Katharopoulos
d343782c8b
Cross platform libmpi loading ( #1975 )
2025-03-21 11:23:10 -07:00
Awni Hannun
4e1994e9d7
move memory APIs into top level mlx.core ( #1982 )
2025-03-21 07:25:12 -07:00
jiyzhang
65a38c452b
update the formula of smooth_l1_loss ( #1986 )
2025-03-21 06:25:23 -07:00
Awni Hannun
7b7e2352cd
fix malloc or wait deadlock ( #1976 )
2025-03-20 16:48:43 -07:00
Awni Hannun
1177d28395
patch bump ( #1981 )
2025-03-20 15:12:22 -07:00
Awni Hannun
005e7efa64
fix mask in sdpa ( #1980 )
...
* fix mask in sdpa
* fix attention mask
* Re-enable routing for array mask
---------
Co-authored-by: Jagrit Digani <digani@apple.com >
2025-03-20 14:53:12 -07:00
Jagrit Digani
b42d13ec84
Update attention tests to show diff, disable array masks ( #1978 )
2025-03-20 14:25:38 -07:00
Jagrit Digani
9adcd1a650
Support fused masking in Attention ( #1924 )
...
* Update API to allow mask='causal' in fast::sdpa
* Add fallback
* Update steel::AttnParams
* Fix typo
* WIP, basic causal
* Update tests
* Update benchmarking
* Update masking loop limits
* Add bool masking and update tests
* Update additive mask
* Update benchmarks
* Update benchmarks
* Update tests
* Update for bfloat error
* Update early exit
* Add random seed to tests
2025-03-20 11:01:32 -07:00
Awni Hannun
3c164fca8c
Fix multistream GPU deadlock ( #1969 )
...
* fix multistream GPU deadlock
* comments
2025-03-20 07:19:47 -07:00
jiyzhang
95e335db7b
Update smooth_l1_loss in losses.py ( #1974 )
...
According the definition of smooth_l1_loss, the line
diff = predictions - targets
Should be updated to
diff = mx.abs(predictions - targets)
After the modification, the result is consistent with PyTorch smooth_l1_loss
2025-03-19 20:19:02 -07:00
Awni Hannun
f90206ad74
Guard nullptr dereference ( #1972 )
...
* guard nullptr dereference
* comment
2025-03-19 16:24:10 -07:00
Chunyang Wen
3779150750
refactor: all use schedule ( #1973 )
2025-03-19 11:24:04 -07:00
Cheng
0a9777aa5c
Do not define MLX_VERSION globally ( #1966 )
2025-03-18 07:12:40 -07:00
Chunyang Wen
45ad06aac8
Fix typo; Fix lint warning when reuse the same name ( #1968 )
...
* Fix typo; Fix lint warning when reuse the same name
* Add missing period
2025-03-18 07:12:24 -07:00
Awni Hannun
c6ea2ba329
Use same accumulation precision in gemv as gemm ( #1962 )
...
* use same accumulation precision in gemv as gemm
* faster
* fix compile
2025-03-16 07:13:24 -07:00
Awni Hannun
2770a10240
fix grad with inplace updates ( #1961 )
2025-03-13 19:13:09 -07:00
Awni Hannun
d2a94f9e6a
Only compile warnings as errors for circle ( #1957 )
2025-03-12 13:08:19 -07:00
Awni Hannun
32da94507a
fix vmap for flatten ( #1955 )
2025-03-11 10:42:22 -07:00
Awni Hannun
736a340478
reduce binary size ( #1952 )
2025-03-11 06:30:44 -07:00
Awni Hannun
117e1355a2
fix copy for large arrays ( #1953 )
2025-03-10 15:04:25 -07:00
Awni Hannun
3c3e558c60
Support transposed head/seq for kv ( #1950 )
...
* support transposed head/seq for kv
* fix flaky test
* nit
2025-03-10 10:53:45 -07:00
Chunyang Wen
cffceda6ee
Add type hint for _extra_repr ( #1948 )
2025-03-10 06:05:36 -07:00
Chunyang Wen
048805ad2c
Remove unused modules ( #1949 )
2025-03-10 06:05:26 -07:00
Chunyang Wen
d14c9fe7ea
Add file info when raising errors in save ( #1943 )
2025-03-08 14:51:04 -08:00
Chunyang Wen
5db90ce822
Fix obsured warning ( #1944 )
2025-03-08 14:50:39 -08:00
Chunyang Wen
d699cc1330
Fix unreachable warning ( #1939 )
...
* Fix unreachable warning
* Update error message
2025-03-07 17:23:04 -08:00
Awni Hannun
c4230747a1
redesign for faster cpu/gpu synch ( #1869 )
...
* redesign for faster cpu/gpu synch
* load + more async CPU
* use command encoder API and move more ops to use it
* make fence back-end generic + CPU only fence
* faster build
* fix async eval
* fixes + handle temporaries
* fix / improve cpu conv
* remove unused status, fix siblings
* fix extensions
* fix
* fix no cpu build
* format
* comments
* fix perf regression, remove unecessary abort
* fix events, task limit cpu
* fix waiting
* fix donation / temporaries in normalization
2025-03-06 19:23:38 -08:00
Awni Hannun
5245f12a46
always use json ( #1938 )
2025-03-06 15:35:56 -08:00
Chunyang Wen
a198b2787e
Remove unused modules ( #1936 )
2025-03-06 14:20:27 -08:00
Chunyang Wen
04edad8c59
Add doc string for path ( #1937 )
2025-03-06 14:20:09 -08:00
David Wisdom
392b3060b0
Fix typo in randint docstring ( #1932 )
...
This commit fixes a typo in the docstring for mlx.core.random.randint() by changing "roadcastable" to "broadcastable".
2025-03-05 21:48:00 -08:00
Chunyang Wen
85b34d59bc
Clean unused sys ( #1929 )
2025-03-05 13:48:03 -08:00
Awni Hannun
f599c11bc8
bump ( #1931 )
2025-03-05 13:16:53 -08:00
Angelos Katharopoulos
0792ff02ff
Only fail when 10 consecutive socket errors occur ( #1928 )
2025-03-05 13:16:19 -08:00
Alex Barron
fd0d63ba5b
Affine quant always in fp32 ( #1925 )
...
* do affine quant in fp32
* static cast
2025-03-04 17:50:19 -08:00
Abe Leininger
3835a428c5
Adds nuclear norm support ( #1894 )
...
* adjust norm unit test tolerance
2025-03-04 13:26:02 -08:00
Angelos Katharopoulos
9680f72cca
Add a multi optimizer ( #1916 )
2025-03-04 13:16:35 -08:00
Angelos Katharopoulos
a0737273d3
Allow debugging in distributed mode ( #1920 )
2025-03-04 13:01:10 -08:00
Awni Hannun
e613d0eaf0
SDPA support for small batch (over sequence) queries ( #1922 )
...
* batch query sdpa
* batch sdpa for query
2025-03-04 10:59:04 -08:00
Awni Hannun
6bcd6bcf70
fix donation in scan ( #1917 )
2025-03-03 11:30:59 -08:00
Awni Hannun
ba12e4999a
Use a heap for small sizes ( #1911 )
...
* use a heap for small sizes
* check if VM
2025-03-03 06:50:57 -08:00
Awni Hannun
4e7cd31d12
Fix slice data size ( #1913 )
...
* fix slice data size
* add test
2025-03-02 21:50:42 -08:00
Angelos Katharopoulos
5e6c130d93
RMS norm without scaling ( #1915 )
2025-02-28 20:26:57 -08:00
Angelos Katharopoulos
5d68082881
Ring docs ( #1829 )
2025-02-28 11:34:21 -08:00
Angelos Katharopoulos
607181644f
Add mlx.distributed_config script ( #1902 )
2025-02-28 11:16:39 -08:00
Jagrit Digani
89d327075f
Enabling fused attention for head dim 128 ( #1899 )
...
* Share KV smem
* Fix bfloat error
* Unroll O = S @ V loop
* Perf upgrade
* Remove commented out function
* Add -Wno-c++17-extensions flag to metal flags
* Add -Wno-c++17-extensions flag to metal extension flags
2025-02-26 10:02:06 -08:00
Angelos Katharopoulos
6bf00ef631
Fix ring of 2 and allow scalars in API ( #1906 )
2025-02-25 17:03:01 -08:00
Awni Hannun
7d042f17fe
Double for lapack ( #1904 )
...
* double for lapack ops
* add double support for lapack ops
2025-02-25 11:39:36 -08:00
Awni Hannun
28b8079e30
fix double type promotion ( #1901 )
2025-02-25 06:00:53 -08:00
Awni Hannun
7face5d9fd
fix cpu compile ( #1897 )
2025-02-24 14:10:30 -08:00
Awni Hannun
a44dc4bdb0
fix leaking objc ( #1898 )
2025-02-24 13:57:59 -08:00
Awni Hannun
2d0f384b6f
fix simd erf_inv ( #1896 )
2025-02-24 13:57:47 -08:00
Awni Hannun
8ff84b5c43
fix version and expose command queue getter ( #1892 )
2025-02-20 15:25:15 -08:00
Angelos Katharopoulos
10b271d963
Ring update ( #1885 )
2025-02-20 14:32:31 -08:00
Jesper Stemann Andersen
0ebc8a3d25
Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp ( #1890 )
2025-02-20 12:02:12 -08:00
Awni Hannun
bbda0fdbdb
Allow non-square lu ( #1889 )
2025-02-20 08:13:23 -08:00
Jesper Stemann Andersen
c86422bdd4
Added mlx::core::version() returning std::string(MLX_VERSION) ( #1819 )
...
* Added version.h providing mlx::core::version() returning std::string(MLX_VERSION)
Also, added MLX_VERSION_MAJOR, MLX_VERSION_MINOR, MLX_VERSION_PATCH, MLX_VERSION_NUMERIC, and accompanying functions.
* Added version.h to mlx.h
* Changed version int functions to be constexpr
* Formatting
* Added handling of MLX_VERSION where only the prefix has major.minor.patch format
* Changed version function to be constexpr
2025-02-19 20:30:19 -08:00
Awni Hannun
c707b2b0a6
Limit compile buffers ( #1887 )
...
* limit compile buffers
* maybe not flaky test
2025-02-19 20:28:13 -08:00
Angelos Katharopoulos
78ba24c37d
Raise an exception in the rope op if input is integer ( #1884 )
2025-02-19 14:43:39 -08:00
Angelos Katharopoulos
1a2cb72030
Ensure linspace always contains start and stop ( #1883 )
2025-02-19 13:53:20 -08:00
Abe Leininger
344a29506e
Enforce triangular matrix form in tri_inv ( #1876 )
...
* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201 .
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-02-19 12:42:33 -08:00
Angelos Katharopoulos
71de73a668
Fix convs by reverting #1803 ( #1882 )
2025-02-18 14:36:34 -08:00
Alex Barron
4c1dfa58b7
xor op on arrays ( #1875 )
2025-02-17 00:24:53 -08:00
Awni Hannun
5274c3c43f
compiler warnings are errors ( #1870 )
2025-02-17 00:07:49 -08:00
Angelos Katharopoulos
1762793989
Remove unused uniform ( #1867 )
2025-02-14 15:51:41 -08:00
Awni Hannun
6cec78d8f2
bump ( #1866 )
2025-02-14 13:09:34 -08:00
Jagrit Digani
2dc307f2e6
Winograd Update for Small batches ( #1803 )
...
* Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels
2025-02-14 13:08:13 -08:00
Awni Hannun
7aea5b1895
Allow dynamic ops per buffer based on dispatches and memory ( #1864 )
...
* Allow dynamic ops per buffer based on dispatches and memory
* add initial arch values
2025-02-13 19:18:22 -08:00
Ronan Collobert
9733e16496
fix function pointer ( #1865 )
2025-02-13 18:46:11 -08:00
Alex Barron
7f2d1024f3
add f8_e4m3 loading ( #1859 )
2025-02-13 17:10:03 -08:00
Awni Hannun
428f589364
Revert "More buffer donation in some cases ( #1858 )" ( #1863 )
...
This reverts commit d274ae77f2 .
2025-02-13 14:21:44 -08:00
Alex Barron
5cd97f7ffe
Bitwise Inverse ( #1862 )
...
* add bitwise inverse
* add vmap + fix nojit
* inverse -> invert
* add to compile + remove unused
2025-02-13 08:44:14 -08:00
Awni Hannun
e425dc00c0
Faster small batch qmv ( #1861 )
...
* faster small batch qmv
* swap batch and block dims for qvm and qmv regular
2025-02-12 22:02:36 -08:00
Awni Hannun
d274ae77f2
More buffer donation in some cases ( #1858 )
...
* more donation
* fix
* add test
2025-02-12 19:41:37 -08:00
Alex Barron
55c5ac7820
fix int64 bug ( #1860 )
2025-02-12 19:23:46 -08:00
Angelos Katharopoulos
0145911bea
Fixes output donation for IO ops on the GPU ( #1857 )
2025-02-12 10:52:30 -08:00
Awni Hannun
0a5215693e
Fix grad copies ( #1854 )
...
* fix grad with copies
* add test
* add test
2025-02-11 15:26:42 -08:00
Awni Hannun
2a45056ba8
Cycle leak break ( #1856 )
...
* detect and break leaks in custom function
* detect and break leaks in custom function
2025-02-11 14:45:02 -08:00
Cheng
142b77751d
Fix compilation error on Windows ( #1844 )
2025-02-10 19:53:05 -08:00
Abe Leininger
a5ededf1c3
CPU LU factorization and linear solvers ( #1451 )
...
* linalg solve backend
* nits
* more nits + fix
* luf primitive and lu, solve, and solve_triangular backends
* changes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-02-10 12:32:24 -08:00
Franck Verrot
7df3f792a2
Ensure Conv2D and Conv3D's kernel sizes aren't trimmed ( #1852 )
...
Before the change, this snippet:
```
print(nn.Conv1d(1, 32, 3, padding=1))
print(nn.Conv2d(1, 32, (3, 3), padding=1))
print(nn.Conv3d(1, 32, (3, 3, 3), padding=1))
```
would output:
```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3,), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```
After the change, the output will be:
```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```
2025-02-10 06:27:01 -08:00
Angelos Katharopoulos
9eb7d7362f
Fix Split::vmap ( #1845 )
2025-02-08 09:22:13 -08:00
Awni Hannun
1c0c118f7c
Fp64 on the CPU ( #1843 )
...
* add fp64 data type
* clean build
* update docs
* fix bug
2025-02-07 15:52:22 -08:00
Awni Hannun
1a1b2108ec
bump ( #1840 )
2025-02-06 11:53:24 -08:00
Jagrit Digani
b6c6552d20
Add missing #pragma once ( #1838 )
2025-02-06 11:11:22 -08:00
Awni Hannun
83a0340fa7
allow command ( #1836 )
2025-02-06 10:32:24 -08:00
Nripesh Niketan
a62fc1b39f
chore: pre-commit bump ( #1837 )
2025-02-06 08:55:01 -08:00
Awni Hannun
af1b725fda
Fix a couple of slicing bugs ( #1827 )
...
* fix a few bugs
* fix conv grad
* speedup test
* comment
2025-02-05 19:50:08 -08:00
Awni Hannun
9174606d4c
fix sort ( #1835 )
2025-02-05 17:16:27 -08:00
Awni Hannun
ca305afdbe
loading empty list is ok when strict = false ( #1834 )
2025-02-05 16:19:27 -08:00
Awni Hannun
fe5987b81d
faster sort ( #1831 )
2025-02-05 06:10:22 -08:00
Awni Hannun
a229c8cef0
don't duplicate malloc with custom kernel init ( #1830 )
2025-02-04 13:20:57 -08:00
Jesper Stemann Andersen
f6c0499b8d
Resolved ambiguity in mlx::core::take_along_axis ( #1822 )
...
* Resolved ambiguity in mlx::core::take_along_axis
Detected by GCC 10 on riscv64-linux-gnu.
* Formatted
* Removed superfluous parentheses in random_tests.cpp
2025-02-04 06:06:17 -08:00
Awni Hannun
1156c84e86
Refactor common into cpu specific and truly common ( #1817 )
...
* refactor
* fix extension example
* fix no-cpu
2025-02-03 15:58:02 -08:00
Awni Hannun
ec7c7def40
no line buffer for mpi jobs ( #1825 )
2025-02-03 12:02:15 -08:00
Jesper Stemann Andersen
2d8e667400
MinGW support ( #1806 )
...
* Changed /bin/bash to bash for generating compiling preamble
* Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS
* Solved ambiguity wrt. bernoulli test shape
* Disabled distributed/ring on Windows
* Fixed jit_compiler command wrt. MinGW
* Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD
2025-02-01 12:40:06 -08:00
Awni Hannun
80c863b972
Remove accelerate/ ( #1816 )
...
* remove accelerate
* comments
* neon reduction
2025-02-01 07:18:26 -08:00
Angelos Katharopoulos
f5cc1eea72
Allow different value dimensions in sdpa_vector ( #1811 )
2025-01-31 20:58:59 -08:00
Awni Hannun
b7c9f1d38f
scatter axis + gather axis primitives ( #1813 )
...
* scatter axis + gather axis primitives
* add transforms
* comment
2025-01-31 20:48:08 -08:00
Awni Hannun
c6fc07f1f4
Unify CPU matmuls, remove unused accelerate conv ( #1814 )
...
* unify matmuls
* Update mlx/backend/common/matmul.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-01-31 14:43:37 -08:00
Angelos Katharopoulos
ded914f442
Small distributed launch helper ( #1810 )
2025-01-29 17:55:04 -08:00
Awni Hannun
4758c8baa1
Start to cleanup/unify accelerate and common back-ends (Part 1/N) ( #1777 )
...
* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv
2025-01-29 14:34:49 -08:00
Awni Hannun
7064fed1b1
Minor update on MPI docs ( #1805 )
2025-01-28 11:00:08 -08:00
Awni Hannun
1017ac4a9e
add dilation for conv 3d layers + test for 3d conv w/ dilation ( #1802 )
2025-01-28 06:17:07 -08:00
Angelos Katharopoulos
ccb61d7aae
Ring distributed backend ( #1784 )
2025-01-27 22:15:01 -08:00
Awni Hannun
2235dee906
catch stream errors earlier to avoid aborts ( #1801 )
2025-01-27 14:05:43 -08:00
Awni Hannun
28091aa1ff
allow build python lib without specifying path ( #1799 )
2025-01-27 11:22:35 -08:00
Awni Hannun
121d9a0702
Fix rope fallback to not upcast ( #1797 )
...
* fix rope fallback to not upcast
* Update mlx/fast.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2025-01-26 19:07:21 -08:00
Nick
0cea88bcc5
Use @ matrix multiplication syntax to document matrix-matrix multiplication ( #1793 )
...
Co-authored-by: Nick Thompson <nicholas_a_thompson@apple.com >
2025-01-25 16:02:36 -08:00
Angelos Katharopoulos
72146fc4cd
Einsum ellipsis ( #1788 )
2025-01-25 01:28:03 -08:00
Awni Hannun
e6a7ab9675
non square qr ( #1783 )
2025-01-21 14:07:47 -08:00
Angelos Katharopoulos
1f4c127fb9
Move some kernels to get_template_definition ( #1782 )
2025-01-21 08:59:44 -08:00
Awni Hannun
90532b1f37
recompile when shapeless is different ( #1776 )
2025-01-20 21:07:10 -08:00
Awni Hannun
a8666a757a
fix shapeless compile on ubuntu24 ( #1775 )
2025-01-18 06:04:36 -08:00
Awni Hannun
a4667da1eb
Faster synchronization Fence primitive ( #1773 )
...
* try faster synchronization
move event
fixes
update bench
fix
fix
* non-functioning kernel
* try alternative fence
* cleanup barrier
* get rid of event_fence
* update benchmarks
* doc string in metal fence
2025-01-17 18:42:19 -08:00
Awni Hannun
0c259961ac
matmul jvps ( #1772 )
2025-01-17 10:36:26 -08:00
Awni Hannun
f288db8d34
Fix synchronization bug for in stream async works ( #1768 )
2025-01-15 06:07:34 -08:00
Awni Hannun
33421c1dd3
Limit grad recursion depth by not recursing through non-grad inputs ( #1764 )
...
* limit grad recursion depth
* add grad of module test
2025-01-14 14:33:18 -08:00
Nripesh Niketan
5cc5201914
feat: Add orthogonal initializer and corresponding tests ( #1651 )
...
* feat: Add orthogonal initializer and corresponding tests
* lint
* Add acknowledgements
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2025-01-13 07:29:20 -08:00
Awni Hannun
252e423e81
fix and cleanup event signal/wait for metal ( #1765 )
2025-01-10 18:37:26 -08:00
wrmsr
a4a2764a52
Fix broadcast_arrays python sig ( #1763 )
2025-01-10 12:33:26 -08:00
Cheng
ab8e832c18
0ul is not size_t on MSVC ( #1762 )
2025-01-10 12:33:11 -08:00
Angelos Katharopoulos
1ce0c0fcb0
Bump version ( #1761 )
2025-01-09 13:48:20 -08:00
Awni Hannun
657f466402
use sdpa and exportable functions in transformer multi head attention ( #1760 )
2025-01-09 13:11:55 -08:00
Alex Barron
c7b0300af5
Fix batched qmv bug ( #1758 )
2025-01-09 11:45:57 -08:00
Awni Hannun
da8c885784
Simplify removes no-ops from the tape ( #1759 )
...
* simplify removes no-ops from the tape
* comment
2025-01-09 11:23:19 -08:00
Awni Hannun
1ccaf80575
Dynamic broadcasting for shapeless compile/export ( #1722 )
...
* working towards dynamic broadcast
* shapeless broadcast
* fix build + nits
* use broadcast arrays in quantize matmul
* some cleanup / consistency
* mend
* some comments
* add vjp, jvp for broadcast axes
2025-01-09 11:04:24 -08:00
Cheng
ec36bfa317
Include command stdout in error message ( #1756 )
...
* Include command stdout in error message
* On Windows pclose returns the exit code
2025-01-08 07:17:03 -08:00
Cheng
b8f76f717a
Print exceptions in eval_cpu/eval_gpu and abort ( #1754 )
2025-01-08 06:31:09 -08:00
Awni Hannun
d1766f2c70
Add boolean mask support in vector SDPA ( #1757 )
2025-01-07 20:24:53 -08:00
Awni Hannun
516ded618b
Dynamic slicing ( #1741 )
...
* dynamic slice and slice update
* python bindings + tests + fix set item
* fix compile issue
* comment
* fix jit
2025-01-07 14:02:16 -08:00
Jesper Stemann Andersen
c9c81d0584
Added additional missing unordered_map include that fixes build on FreeBSD ( #1755 )
2025-01-07 08:27:55 -08:00
Angelos Katharopoulos
545f84d905
Refactor distributed backend ( #1752 )
2025-01-06 17:33:15 -08:00
Awni Hannun
d5ec172c95
Allow boolean mask in sdpa ( #1753 )
...
* allow boolean mask in sdpa
* more permissive donation in ternary
2025-01-06 16:57:07 -08:00
Angelos Katharopoulos
25b3a3e541
Optionally specify names for arrays when exporting ( #1749 )
2025-01-06 13:07:46 -08:00
Awni Hannun
058d6ce683
mpi send use input as output ( #1750 )
...
* mpi send use input as output
* move earlier
2025-01-06 06:08:43 -08:00
Angelos Katharopoulos
eab93985b8
Update custom function docs ( #1748 )
2025-01-03 16:35:25 -08:00
Awni Hannun
b51d70a83c
export docs ( #1747 )
2025-01-03 15:04:17 -08:00
Awni Hannun
259025100e
Fix nd ternary on GPU ( #1746 )
2025-01-03 11:52:17 -08:00
Awni Hannun
c9d30aa6ac
MLX in C++ example ( #1736 )
...
* MLX in C++ example
* nits
* fix docs
2025-01-02 19:09:04 -08:00
Angelos Katharopoulos
8544b42007
Add namespace ( #1745 )
2025-01-02 16:49:23 -08:00
Awni Hannun
6fa0501387
Fix concatenate/slice_update vjp + reduce binary size ( #1735 )
...
* fix concatenate vjp + reduce binary size
* also cast in slice update
2025-01-02 16:36:33 -08:00
Awni Hannun
ae69cb15e9
shapeless compile in docs and partially shapeless reshape ( #1742 )
2025-01-02 16:24:42 -08:00
Awni Hannun
a64a8dfe45
fix extension ( #1740 )
2025-01-02 16:16:16 -08:00
Venkata Naga Aditya Datta Chivukula
491fa95b1f
Added Kronecker Product ( #1728 )
2025-01-02 16:00:34 -08:00
Danilo Peixoto
92ec632ad5
Fix Distributed Communication documentation ( #1731 )
...
* Add missing `size()` method call for group
2025-01-02 14:08:38 -08:00
Cheng
8ecdfb718b
Fix export.cpp compilation with MSVC ( #1737 )
2024-12-29 06:56:30 -08:00
Awni Hannun
4ba0c24a8f
Export / import functions to / from a file ( #1642 )
...
* export and import functions
* refactor + works for few primitives
* nit
* allow primitives with state
* nit
* nit
* simplify serialize / deserialize
* fix for constants
* python bindings
* maybe fix serialize failure case
* add example
* more primitives, training kind of works
* same result for python and c++
* some fixes
* fix export
* template it up
* some simplificatoin
* rebase
* allow kwargs and multiple functions
* exporter
* more primitives for exporting
* deal with endianness
* handle invalid stream
* add docstring
2024-12-24 11:19:13 -08:00
Cheng
935c8c4bb1
Make mx.compile work on Windows ( #1697 )
...
* Invoke MSVC on Windows in mx.compile
* Export kernel symbol on MSVC
* Remove unused template
* Parse env pairs in a robust way
* No need of cassert
* Remove unnecessary helpers
* Fix right trim
* Move command building to a separate file
* Missing header
* Do not pollute cwd with cl.exe
* Simplify str concat
* Pass output dir
* Fix styling
2024-12-24 07:02:33 -08:00
Valentin Roussellet
88f993da38
Explicit parentheses around some logical operators ( #1732 )
...
* fix some warnings
* format
2024-12-24 07:02:20 -08:00
Awni Hannun
ebfe64b92d
shapeless slice update and broadcast when possible ( #1727 )
2024-12-23 11:25:15 -08:00
Awni Hannun
0308e9af71
Allow offset to be an mx.array for mx.fast.rope ( #1724 )
...
* allow offset for rope
* comment
2024-12-19 15:51:44 -08:00
Awni Hannun
c3628eea49
Add mx.finfo and use it when making causal mask ( #1726 )
...
* finfo
* fixes
* docs
2024-12-19 14:52:41 -08:00
Awni Hannun
e03f0372b1
More shape type ( #1705 )
...
* more shape type
* fix
2024-12-19 08:08:20 -08:00
Alex Barron
f17536af9c
More lenient mask type check in SDPA ( #1723 )
...
* check mask type
* require promotion
2024-12-18 19:41:38 -08:00
Cheng
ed4ec81bca
Link python extension with mlx statically on Windows ( #1716 )
...
* Link python extension with mlx statically on Windows
* More readable code
2024-12-18 19:26:04 -08:00
Awni Hannun
7480059306
track resource limit and throw if exceeded ( #1718 )
2024-12-18 18:45:58 -08:00
Awni Hannun
8bae22b0fa
fix deletion of non-evaled arrays with siblings ( #1714 )
2024-12-18 18:45:36 -08:00
Alex Barron
49c34c4161
check mask type ( #1721 )
2024-12-18 14:25:18 -08:00
Awni Hannun
5548fcc96d
fix synch race ( #1719 )
2024-12-18 12:25:16 -08:00
Cheng
070bd433ab
Shorter kernel name for Windows ( #1701 )
...
* Shorter kernel name for Windows
* Only hash the clipped part
2024-12-17 18:51:38 -08:00
Cheng
c8fb54951a
Define NOMINMAX before windows.h ( #1715 )
2024-12-17 18:51:24 -08:00
Awni Hannun
f110357aaa
Bump nanobind to 2.4 + fix ( #1710 )
...
* bump nanobind to 2.4 + fix
* fix
2024-12-17 10:57:54 -08:00
Tomohiro Oga
a6b426422e
add cubic to type hinting for upsample ( #1709 )
2024-12-17 07:30:23 -08:00
Awni Hannun
d03c01dfbc
fix unflatten vjp ( #1708 )
2024-12-16 18:37:57 -08:00
Jesper Stemann Andersen
a82996e9fb
io/load: Enabled pread implementation for mingw32 ( #1706 )
2024-12-16 07:20:45 -08:00
Cheng
af5a614aad
Eval before cleanup so model file is unlocked ( #1702 )
2024-12-14 21:41:49 -08:00
Cheng
f9640e049d
Install mlx.dll into the same dir with python bindings on Windows ( #1690 )
...
* Install mlx.dll into the same dir with python bindings on Windows
* Set BUILD_SHARED_LIBS for dlfcn-win32
* Update cmake requirements to 3.25
* Fix cmake style
2024-12-13 19:50:39 -08:00
Cheng
4768c61b57
Make sure gguf_ctx is closed when error happens ( #1699 )
2024-12-13 19:50:19 -08:00
Cheng
dfccd17ab9
Use psutil to get memory info on Windows ( #1700 )
2024-12-13 19:50:13 -08:00
Cheng
635117c5d4
Read/write files in binary mode ( #1698 )
2024-12-13 17:37:05 -08:00
Awni Hannun
50f3535693
Use expand_dims / unflatten / etc in more places ( #1696 )
...
* use expand_dims / unflatten in a couple more places
* few more
* few more
* fix
2024-12-12 17:00:44 -08:00
Awni Hannun
9111999af3
Fix small sort with metal validation ( #1695 )
2024-12-12 09:21:45 -08:00
Awni Hannun
6bd28d246e
Allow no copy negative strides in as_strided and slice ( #1688 )
...
* allow no copy negative strides in as_strided and slice
* fix jit
* fix jit
2024-12-12 08:59:45 -08:00
Cheng
4d595a2a39
Make compiled preamble work in MSVC ( #1675 )
...
* Make compiled preamble work in MSVC
* Remove logging
* Only use powershell for MSVC
2024-12-12 08:55:49 -08:00
Awni Hannun
3a21f61772
Fix build ( #1693 )
2024-12-11 23:56:25 -08:00
Awni Hannun
4e1e9520e1
Flatten and unflatten ( #1692 )
...
* flatten and unflatten
* fix grad
* fix shape infer
* use squeeze + unsqueeze in get_item
2024-12-11 21:51:37 -08:00
Cheng
0bf19037ca
Remove "using namespace mlx::core" in python/src ( #1689 )
2024-12-11 15:45:39 -08:00
Awni Hannun
f3dfa36a3a
Fix x86 tests ( #1691 )
...
* fix x86 tests
* comment
2024-12-11 07:47:18 -08:00
Cheng
4f9b60dd53
Remove "using namespace mlx::core" in benchmarks/examples ( #1685 )
...
* Remove "using namespace mlx::core" in benchmarks/examples
* Fix building example extension
* A missing one in comment
* Fix building on M chips
2024-12-11 07:08:29 -08:00
Awni Hannun
f76a49e555
ExpandDims primitive (#1687 )
...
* add squeeze primitive
* simplify squeeze, use in gather
* fix
* fix
* fix
* fix
* fix no cpu
* use squeeze in matmul and friends
* expand dims primitive
* comment
2024-12-10 16:39:07 -08:00
Cheng
310ad8d9db
Build OpenBLAS from source code for MSVC ( #1674 )
...
* Download OpenBLAS binaries when building with MSVC
* Download dlfcn-win32
* Link with dlfcn-win32 correctly
* Build OpenBLAS from source code
* Link with openblas statically
* Link with BLAS privately
2024-12-10 16:14:44 -08:00
Cheng
56db268f47
Provide a pread implementation for MSVC ( #1666 )
2024-12-10 15:55:53 -08:00
Cheng
92ab6bdeb8
Fix shared library not exporting symbols on Windows ( #1684 )
...
* Fix shared library not exporting symbols on Windows
* Function name style
2024-12-10 13:59:14 -08:00
Cheng
0070e360a1
Disable MSVC warnings ( #1680 )
2024-12-09 19:41:14 -08:00
Amethyst Shen
9df8fed046
Metal-cpp version bump ( #1668 )
...
* Metal-cpp version bump
Apple has released the stable version of Metal-cpp for macOS 15 and iOS 18. CMakeLists.txt is updated to build with it instead of the beta one.
* Fix style with cmake-format
2024-12-09 19:40:35 -08:00
Cheng
a59fae040f
Fix library output directory for MSVC ( #1681 )
2024-12-09 19:07:50 -08:00
Awni Hannun
29a620cab2
No reshapes in quantized embedding ( #1682 )
...
* no reshapes in quantized embedding
* fix inadvertant cast
* add tol
2024-12-09 18:57:38 -08:00
Cheng
87d7a2520e
Use Py_ssize_t in python bindings ( #1678 )
...
* Use Py_ssize_t in python bindings
* Args passed to std::max must be same type
2024-12-09 12:59:19 -08:00
Awni Hannun
40c62c1321
Use int64 stride everywhere ( #1671 )
...
* use int64 stride everywhere
* fix ext
* fix ext
* more shape + cleanup
* one more
* few more
2024-12-09 11:09:02 -08:00
Awni Hannun
35b412c099
Fix compile hasher for string constants. ( #1677 )
...
* fix hash
* add test
* nit
2024-12-09 09:26:18 -08:00
Cheng
d0f471cff7
Using math defines requires switch in MSVC ( #1665 )
...
* Using math defines requires switch in MSVC
* Fix more math macros
* Fix type
* Remove _MSC_VER guard for math defines
2024-12-08 08:16:28 -08:00
Cheng
6f316b8bf5
Use int64_t instead of ssize_t ( #1673 )
2024-12-07 20:10:44 -08:00
Cheng
7c10c93a1f
Convert filesystem path to std::string explicitly ( #1672 )
2024-12-07 20:10:06 -08:00
Cheng
d92ea094f1
Use && instead of and ( #1663 )
...
* Use && instead of and
* Remove "and" in ops.cpp
2024-12-07 18:26:39 -08:00
Cheng
6ae5423b4a
Do not pass integers to isnan ( #1664 )
2024-12-07 18:26:23 -08:00
Cheng
9635cffdc8
Include io.h in MSVC for IO functions ( #1661 )
2024-12-07 18:26:06 -08:00
Cheng
96986fb362
Use auto* for pointers ( #1662 )
2024-12-07 18:25:40 -08:00
Cheng
3ceb341a75
Use correct complex type for MSVC ( #1660 )
2024-12-07 18:25:22 -08:00
Awni Hannun
50fa705125
patch bump ( #1656 )
2024-12-06 13:16:19 -08:00
Awni Hannun
69a2991614
allow compiling lambdas in C++ ( #1650 )
...
* allow compiling lambdas in C++
* fix test
* more tests
* auto detect capture-less lambda
2024-12-06 13:13:21 -08:00
mt_caret
fd3377dd1f
Support bias correction in Adam and AdamW optimizers ( #1640 )
2024-12-06 12:13:34 -08:00
Awni Hannun
d0b6cb0425
More primitives for compiling with shapeless ( #1653 )
...
* more shapeless and more Shape
* more shape
* fix
* fix
2024-12-06 11:29:18 -08:00
Alex Barron
95c4a2e3af
add back conditionaltype ( #1655 )
2024-12-06 11:12:01 -08:00
Awni Hannun
bc2a29f033
fix ( #1654 )
2024-12-06 10:48:58 -08:00
Nripesh Niketan
3bb5b4a302
Chore: Add default language in pre-commit and bump hooks ( #1652 )
2024-12-06 07:54:29 -08:00
Awni Hannun
fc88fd9097
Shape and Strides 1 / N ( #1645 )
...
* shape and stride type def
* more shape
2024-12-05 12:53:43 -08:00
Awni Hannun
c5b0928c1f
fix fallback ( #1646 )
2024-12-05 11:59:53 -08:00
Awni Hannun
e047fd977d
compile changes if stream changes ( #1644 )
2024-12-03 14:37:44 -08:00
Jagrit Digani
9d40e521d7
Stop matrix copies with new attention kernel ( #1639 )
2024-12-02 14:12:38 -08:00
Alex Barron
1445dcaa60
let class predicate specify quantization parameters ( #1638 )
2024-12-02 14:09:28 -08:00
Jesper Stemann Andersen
e4eeb4e910
Added missing unordered_map includes ( #1635 )
...
* Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD
2024-12-02 07:03:03 -08:00
Awni Hannun
aa86876813
fix transformer decoder post norm LN ( #1637 )
2024-12-02 07:02:17 -08:00
Jesper Stemann Andersen
974bb54ab2
CMake: Enabled using Accelerate on x86_64 / x64 ( #1625 )
...
* CMake: Enabled using Accelerate on x86_64 / x64
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
* CMake: Removed superfluous MLX_BUILD_ARM
2024-11-28 10:55:45 -08:00
Ikko Eltociear Ashimine
9bc2183a31
docs: update device.cpp ( #1632 )
...
unecessary -> unnecessary
2024-11-27 20:58:26 -08:00
Awni Hannun
d4b222b6d3
Fix some leaks and races ( #1629 )
...
* fix leak and fix potential race
* more leak fixes
* fix one more
2024-11-27 20:01:20 -08:00
Jesper Stemann Andersen
af2af818a6
Enables build for *-linux-musl ( #1627 )
...
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
2024-11-27 13:14:24 -08:00
Jesper Stemann Andersen
698e63a608
CMake: Build with dlfcn-win32 to have dlopen etc. on win32 ( #1628 )
...
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
2024-11-27 13:14:13 -08:00
Awni Hannun
211411faf2
fix large ops ( #1620 )
2024-11-24 09:17:10 -08:00
Awni Hannun
bb303c45a5
version ( #1617 )
2024-11-22 12:00:03 -08:00
Alex Barron
6f7986d592
Cleaner qmv/qvm ( #1616 )
2024-11-22 11:14:08 -08:00
Awni Hannun
7cbb4aef17
Doc fix ( #1615 )
2024-11-22 11:12:25 -08:00
Jagrit Digani
02bec0bb6d
Matrix Attention kernel ( #1610 )
...
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
2024-11-22 10:34:05 -08:00
Alex Barron
c79f6a4a8c
3 and 6 bit quantization ( #1613 )
...
* Support 3 and 6 bit quantization
2024-11-22 10:22:13 -08:00
Awni Hannun
0c5eea226b
Reduce specializations ( #1607 )
...
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
2024-11-21 19:53:00 -08:00
Awni Hannun
dcca0d7477
contiguous op / prim ( #1612 )
2024-11-21 19:51:49 -08:00
Cocoa
0d5e7716ad
fix typo: accross -> across ( #1609 )
...
Signed-off-by: Cocoa <i@uwucocoa.moe >
2024-11-20 15:30:51 -08:00
Angelos Katharopoulos
d8c824c594
Formatting fixes ( #1606 )
2024-11-20 15:30:36 -08:00
Saanidhya
cb431dfc9f
Adds 3D pooling ( #1526 )
2024-11-19 16:45:24 -08:00
Awni Hannun
61d787726a
Fix view scalar bug segfault ( #1603 )
...
* fix view scalar bug
* fix view scalar bug
* one more fix
2024-11-19 10:54:05 -08:00
Angelos Katharopoulos
5e89aace9b
Fix concatenate vmap ( #1600 )
2024-11-19 10:44:04 -08:00
Awni Hannun
2af7e8a9a6
fix cmake version ( #1601 )
2024-11-19 08:45:05 -08:00
Awni Hannun
2419edd5b2
Faster indexing math in a few kernels ( #1589 )
...
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
2024-11-18 19:52:00 -08:00
Awni Hannun
bf481e8e5d
Fix sibling leak ( #1590 )
...
* add test
* fix + test
* fix fix
2024-11-18 19:17:01 -08:00
Awni Hannun
9d7fa6b8e6
Use osx deployment target to pick Metal version ( #1595 )
...
* choose metal based on deployment target rather than system version
* nit
* unused compile def
2024-11-18 19:16:49 -08:00
Angelos Katharopoulos
073076ac7d
2-Pass Sdpa Inference Kernel ( #1597 )
2024-11-18 17:31:53 -08:00
Awni Hannun
9bd03dd9b4
More buffer donation with no-ops ( #1591 )
...
* more donation
* fix test
* fix build
2024-11-18 08:35:41 -08:00
Awni Hannun
6931f84412
fix dispatch threads for a few kernels ( #1594 )
2024-11-18 08:35:25 -08:00
xnorai
16ec0556a0
Allocate raw JSON metadata buffer on the heap, and limit its size ( #1596 )
...
* Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors
2024-11-18 07:22:51 -08:00
Awni Hannun
610af352d4
Dispatch bf16 at run time when using the JIT ( #1584 )
...
* Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h
2024-11-15 16:54:36 -08:00
Awni Hannun
b35f1e3c9c
fix donation in sdpa ( #1587 )
2024-11-13 17:21:13 -08:00
Awni Hannun
dfa0b9aab4
Cpu fast quantize ( #1578 )
...
* cpu quantize
* fix
2024-11-08 20:10:39 -08:00
Alex Barron
a4c47b0276
OOB QMV fix ( #1579 )
...
* fix oob access in qmv
* skip more
* fix small case
2024-11-08 17:59:45 -08:00
Alex Barron
111fefd5e9
Fix OOB access in qmv ( #1577 )
...
* fix oob access in qmv
* skip more
2024-11-08 15:41:30 -08:00
Awni Hannun
c1fe1ef081
Bfs width limit ( #1568 )
...
* width limit
* fix
* large limit
* put env vars in env namespace
2024-11-08 15:00:46 -08:00
Awni Hannun
8c34c9dac4
throw for invalid case and remove test ( #1575 )
2024-11-08 12:04:03 -08:00
Awni Hannun
91c0277356
fix per-example mask + docs in sdpa ( #1574 )
2024-11-08 11:51:15 -08:00
Awni Hannun
9f0d5c12fc
Fully wrap the command encoder ( #1572 )
...
* fully wrap the command encoder
* use consistent style + fix extensions
2024-11-08 11:50:21 -08:00
Awni Hannun
59247c2b62
add groups in conv2d ( #1569 )
2024-11-07 13:57:53 -08:00
Awni Hannun
9a3842a2d9
fix ( #1566 )
2024-11-06 17:10:33 -08:00
Alex Barron
726dbd9267
v0.20.0 ( #1565 )
2024-11-05 12:37:57 -08:00
Awni Hannun
54f05e7195
Fix gather vmap ( #1563 )
...
* fix gather
* fix
2024-11-05 11:29:20 -08:00
Alex Barron
26be608470
Add split_k qvm for long context ( #1564 )
...
* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance
2024-11-05 11:25:19 -08:00
Angelos Katharopoulos
248431eb3c
Reductions update ( #1351 )
2024-11-04 22:25:16 -08:00
Awni Hannun
76f275b4df
error in rms for wrong size ( #1562 )
2024-11-04 13:24:02 -08:00
Awni Hannun
f1951d6cce
Use fewer barriers ( #1561 )
...
* use fewer barriers
* comment
2024-11-04 10:26:49 -08:00
Angelos Katharopoulos
62f297b51d
Sdpa fix ( #1558 )
2024-11-02 21:25:46 -07:00
Awni Hannun
09bc32f62f
No extra reshape ( #1557 )
...
* no extra reshape
* lint
2024-11-02 19:07:20 -07:00
Chris Offner
46d8b16ab4
Fix vmap example in docs ( #1556 )
2024-11-02 17:44:14 -07:00
Chris Offner
42533931fa
Fix typo "it's" -> "its" ( #1555 )
2024-11-02 06:06:34 -07:00
Awni Hannun
9bd3a7102f
add python 3.13 to circle ( #1553 )
2024-11-01 20:55:35 -07:00
Alex Barron
9e516b71ea
Add dispatchThreads to custom kernel doc ( #1551 )
...
* add dispatchThreads info
* update
* add link
2024-11-01 13:07:48 -07:00
Awni Hannun
eac961ddb1
patch ( #1550 )
2024-10-31 16:10:14 -07:00
Awni Hannun
57c6aa7188
fix multi output leak ( #1548 )
2024-10-31 09:32:01 -07:00
Awni Hannun
cde5b4ad80
patch ( #1546 )
2024-10-30 19:31:22 -07:00
Awni Hannun
4f72c66911
improvements to scatter / gather ( #1541 )
2024-10-30 19:30:54 -07:00
Jagrit Digani
960e3f0f05
Gemm update ( #1518 )
2024-10-30 19:30:28 -07:00
Awni Hannun
884af42da2
Fix thread group for large arrays ( #1543 )
...
* fix thread group for large arrays
* comment
* one more
2024-10-30 16:25:12 -07:00
Alex Barron
048fabdabd
Fix vmap constant output size ( #1524 )
...
* use inputs to determine output size
* remove noop vmap tests
2024-10-30 16:16:53 -07:00
Léo
917252a5a1
Add favicon to docs ( #1545 )
...
* add sphinx's html_favicon config
* removed unneeded newline
* ran pre-commit hooks
2024-10-30 13:54:13 -07:00
Carlo Cabrera
1a992e31e8
Skip using Residency sets in VMs ( #1537 )
...
* Skip using Residency sets in VMs
Attempting to use residency sets in a VM throws[^1]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [metal::Device] Unable to construct residency set.
Not quite sure if this is the best fix, but it does make the error go
away.
Note that it was previously possible to run simple programs that used
mlx in a VM prior to 0eb56d5be0 . See
related discussion at Homebrew/homebrew-core#195627 .
[^1]: https://github.com/Homebrew/homebrew-core/actions/runs/11525831492/job/32105148462#step:3:56
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* change residency check
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com >
2024-10-29 19:37:23 -07:00
Awni Hannun
d2ff04a4f2
fix format ( #1539 )
2024-10-28 18:29:14 -07:00
Awni Hannun
015c247393
change wino dispatch conditoin ( #1534 )
2024-10-28 11:13:44 -07:00
Awni Hannun
d3cd26820e
Faster bits and bernoulli ( #1535 )
...
* faster bits and bernoulli
* fix bernoulli
2024-10-28 11:11:00 -07:00
Awni Hannun
91f6c499d7
fix ( #1529 )
2024-10-25 19:25:35 -07:00
Awni Hannun
35e9c87ab9
patch bump ( #1528 )
2024-10-25 13:13:23 -07:00
Awni Hannun
8e88e30d95
BFS graph evaluation order ( #1525 )
...
* bfs order
* try fix event issue
2024-10-25 10:27:19 -07:00
Awni Hannun
0eb56d5be0
Wired ( #1510 )
...
* expose residency sets as wire/unwire
* returns wired size
* fix
* runtime support check
* fix os check
* fix test
* fix no metal build
* docs
* nit
* nits in docs
* nits
2024-10-25 09:35:33 -07:00
Paul Hansel
f70764a162
Fix typo in build docs ( #1522 )
2024-10-24 20:55:06 -07:00
Awni Hannun
dad1b00b13
fix ( #1523 )
2024-10-24 19:17:46 -07:00
Venkata Naga Aditya Datta Chivukula
430ffef58a
[Feature] Added Sparse Initialization ( #1498 )
...
Co-authored-by: Saanidhyavats <saanidhyavats@gmail.com >
2024-10-24 12:31:24 -07:00
Alex Barron
3d17077187
Add mx.array.__format__ ( #1521 )
...
* add __format__
* actually test something
* fix
2024-10-24 11:11:39 -07:00
Angelos Katharopoulos
c9b41d460f
Working 64-bit scans ( #1506 )
2024-10-24 11:05:46 -07:00
xnorai
32972a5924
C++20 compatibility for fmt ( #1519 )
...
* C++20 compatibility for fmt
* Address review feedback
* Remove stray string
* Add newlines back
2024-10-24 08:54:51 -07:00
Dhruv Govil
f6afb9c09b
Remove use of vector<const T> ( #1514 )
2024-10-22 16:31:52 -07:00
Kashif Rasul
3ddc07e936
Eigenvalues and eigenvectors ( #1334 )
...
* initial eigvalsh
* add compute_vectors
* add compute_vectors_
* return a pair
* add eigh to return only eigenvectors
* fixed typo
* merge merge Eighvalsh and Eigh into a single primitive
* use the same primate with the flag
* fix primatives
* use MULTI
* fix eval_gpu
* fix decleration
* rename EighPrimitive to Eigh
* tests
* tests
* fix rebase and format
* cleanup lapack
* format
* add cblas.h
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-10-22 12:18:48 -07:00
Awni Hannun
c26208f67d
Remove Hazard tracking with Fences ( #1509 )
...
* remove hazard tracking
* with fence map
* no hazard tracking with fences
* nits
* fix fence retain
* cleanup
* fix quantized rebase
2024-10-21 19:33:32 -07:00
Alex Barron
d15fa13daf
Batched Quantized Matmul + Fast Small QMV ( #1503 )
...
* add fast qmv for small dims
* fix test
* batched cpu
* add batched template param
* refactor metal quantized.cpp
2024-10-21 16:23:17 -07:00
Awni Hannun
58a855682c
v0.19.0 ( #1502 )
2024-10-18 11:55:18 -07:00
Awni Hannun
92d7cb71f8
Fix compile ( #1501 )
...
* fix compile
* fix space
2024-10-18 11:06:40 -07:00
Angelos Katharopoulos
50d8bed468
Fused attention for single query ( #1497 )
2024-10-18 00:58:52 -07:00
Awni Hannun
9dd72cd421
fix gumbel ( #1495 )
2024-10-17 13:52:39 -07:00
Awni Hannun
343aa46b78
No more 3.8 ( #1493 )
2024-10-16 17:51:38 -07:00
Awni Hannun
b8ab89b413
Docs in ci ( #1491 )
...
* docs in circle
2024-10-15 17:40:00 -07:00
Awni Hannun
f9f8c167d4
fix submodule stubs ( #1492 )
2024-10-15 16:23:37 -07:00
Awni Hannun
3f86399922
Real and Imag ( #1490 )
...
* real and imag
* fix
* fix
2024-10-15 16:23:15 -07:00
LastWhisper
2b8ace6a03
Typing the dropout. ( #1479 )
2024-10-15 06:45:46 -07:00
Awni Hannun
0ab8e099e8
Fix cpu segfault ( #1488 )
...
* fix cpu segfault
* nit in tests
2024-10-14 16:17:03 -07:00
Awni Hannun
020f048cd0
A few updates for CPU ( #1482 )
...
* some updates
* format
* fix
* nit
2024-10-14 12:45:49 -07:00
Awni Hannun
881615b072
Faster metal compiled kernels + some fixes ( #1486 )
...
* bump mac tests to use py39
* work per thread for compiled kernels
* fixe for large arrays
* fix
2024-10-14 12:45:38 -07:00
Awni Hannun
0eef4febfd
bump mac tests to use py39 ( #1485 )
2024-10-14 10:40:32 -07:00
Awni Hannun
b54a70ec2d
Make push button linux distribution ( #1476 )
...
* try again
* try again
* try again
* try again
* try again
* try again
* try again
* try again
* .circleci/config.yml
* one more fix
* nit
2024-10-14 06:21:44 -07:00
Awni Hannun
bf6ec92216
Make the GPU device more thread safe ( #1478 )
...
* gpu stream safety
* comment
* fix
2024-10-12 17:49:15 -07:00
Awni Hannun
c21331d47f
version bump ( #1477 )
2024-10-10 13:05:17 -07:00
Awni Hannun
e1c9600da3
Add mx.random.permutation ( #1471 )
...
* random permutation
* comment
2024-10-08 19:42:19 -07:00
Awni Hannun
1fa0d20a30
consistently handle all -inf in softmax ( #1470 )
2024-10-08 09:54:02 -07:00
Awni Hannun
3274c6a087
Fix array is_available race cases ( #1468 )
2024-10-07 19:13:50 -07:00
Angelos Katharopoulos
9b12093739
Add the roll op ( #1455 )
2024-10-07 17:21:42 -07:00
Awni Hannun
f374b6ca4d
Bump nanobind to 2.2 ( #1461 )
...
* bump nanobind
* extension version for tests
2024-10-07 16:52:40 -07:00
Awni Hannun
0070e1db40
Fix deep recursion with siblings ( #1462 )
...
* fix recursion with siblings
* fix
* add test
* increase tol
2024-10-07 06:15:33 -07:00
Awni Hannun
95d04805b3
Fix complex power on Metal ( #1460 )
2024-10-06 19:58:30 -07:00
Awni Hannun
e4534dac17
Conv grad with groups + bugfix ( #1449 )
...
* fix bug in flipped conv with groups, start of grad for groups
* fix
* fix
* fix + test
2024-10-06 07:08:53 -07:00
Angelos Katharopoulos
fef3c4ec1d
Fix mpi test in CI ( #1456 )
...
* Fix mpi test in CI
* Set bind to none
2024-10-06 06:09:17 -07:00
Awni Hannun
1bdc038bf9
fix argpartition + faster {arg} sorts / partitions ( #1453 )
2024-10-03 14:21:25 -07:00
Awni Hannun
5523d9c426
faster cpu indexing ( #1450 )
2024-10-03 13:53:47 -07:00
Angelos Katharopoulos
d878015228
Fix normalization check_input ( #1452 )
2024-10-03 13:26:56 -07:00
Cheng
5900e3249f
Fix building on Linux ( #1446 )
2024-09-30 07:00:39 -07:00
Angelos Katharopoulos
bacced53d3
Fix row reduce with very few rows ( #1447 )
2024-09-29 20:00:35 -07:00
Lucas Newman
4a64d4bff1
Add support for grouped 1D convolutions to the nn API ( #1444 )
...
* Fix the weight shape for grouped convolutions from the nn API.
* Add tests.
* Pre-commit formatting.
* Add input validation.
* Use integer division instead of casting.
* docs
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-28 06:41:07 -07:00
Awni Hannun
b1e2b53c2d
bump ( #1445 )
2024-09-27 13:53:02 -07:00
Awni Hannun
11354d5bff
Avoid io timeout for large arrays ( #1442 )
2024-09-27 13:32:14 -07:00
Awni Hannun
718aea3f1d
allow take to work with integer index ( #1440 )
2024-09-26 15:58:03 -07:00
Awni Hannun
5b6f38df2b
Faster cpu ops ( #1434 )
...
* faster binary and cleaner copy
* use recursive template for other ops
* more cleanup
* fix from cleanup
* more clean
* fix binary
* use contiguous iterator
* add 3d
* nits
* fix
* fix?
* fix
* fix rebase
2024-09-26 09:19:13 -07:00
Awni Hannun
0b4a58699e
Some overhead reductions in mx.fast.metal_kernel ( #1437 )
...
* some overhead reductions
* fix
* use +=
* use more +=
2024-09-25 17:25:21 -07:00
Awni Hannun
4f9f9ebb6f
Faster Metal unary and binary for general case ( #1431 )
...
* faster unary and binary for general case
* update ternary + jit fix
* fix jit
* unary work per thread
2024-09-25 12:07:43 -07:00
Awni Hannun
afc9c0ec1b
dtype is copy assignable ( #1436 )
2024-09-25 12:07:13 -07:00
Awni Hannun
195b429d99
Put along axis + fixe for partition grad ( #1430 )
...
* put along axis, fixes for partition grad
* zeros for arg reduce
2024-09-23 10:03:38 -07:00
Luke Carlson
2b878e9dd7
Create CITATION.cff ( #1425 )
2024-09-20 11:39:46 -07:00
Awni Hannun
67b6bf530d
Optimization for general ND copies ( #1421 )
2024-09-17 17:59:51 -07:00
Nripesh Niketan
6af5ca35b2
feat: add cross_product ( #1252 )
...
* feat: add cross_product
* lint
* python binding
* refactor: Improve error message for cross_product function
* refactor: more close to numpy cross product
* refactor: improve error message for cross_product function
* finish
* fix acks
* allow old numpy
* doc
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-17 13:12:43 -07:00
Awni Hannun
4f46e9c997
More fixes for arrays with large sizes ( #1405 )
...
* compile works for big arrays when contiguous
* style
* nits in docs
* a bunch more stuff
* update jit
* update jit
* use constant for shapes and strides and remove elem_to_loc overload
* use kernel instantiation
* docs nits
* update binary and ternary
* comments
2024-09-17 12:46:31 -07:00
Awni Hannun
c6739ba7f3
Faster RNN layers ( #1419 )
...
* faster rnn
* use admm
2024-09-17 06:04:19 -07:00
Angelos Katharopoulos
914409fef9
Data parallel helper ( #1407 )
2024-09-16 18:17:21 -07:00
jjuang-apple
8d68a3e805
remove fmt dependencies from MLX install ( #1417 )
2024-09-16 13:32:28 -07:00
jjuang-apple
6bbcc453ef
avoid using find_library to make install truly portable ( #1416 )
2024-09-16 13:21:32 -07:00
Awni Hannun
d5ed4d7a71
override class function ( #1418 )
2024-09-16 13:21:04 -07:00
Nripesh Niketan
669c27140d
Chore: add pre-commit hook for cmake ( #1362 )
...
* reset and lint
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-16 12:53:01 -07:00
Max-Heinrich Laves
adcc88e208
Conv cpu improvements ( #1410 )
2024-09-15 18:45:10 -07:00
Awni Hannun
d6492b0163
fix clip ( #1415 )
2024-09-14 16:09:09 -07:00
Awni Hannun
b3f52c9fbe
ensure io/comm streams are active before eval ( #1412 )
2024-09-14 06:17:36 -07:00
c0g
bd8396fad8
Fix typo in transformer docs ( #1414 )
2024-09-14 06:05:15 -07:00
Angelos Katharopoulos
d0c58841d1
Patch bump ( #1408 )
2024-09-12 16:44:23 -07:00
Angelos Katharopoulos
881f09b2e2
Allow querying the allocator for the buffer size ( #1404 )
2024-09-11 21:02:16 -07:00
Awni Hannun
8b30acd7eb
fix module attribute set, reset, set ( #1403 )
2024-09-11 16:30:42 -07:00
Awni Hannun
02efb310ca
Xcode 160 ( #1384 )
...
* xcode 16.0 with debug tests
* limit nproc for builds
* vmap bug
* assert bug
* run python tests in debug mode
* fix view, bool copies preserve bits'
* actual view fix
2024-09-10 15:15:17 -07:00
Awni Hannun
e7e59c6f05
Fix copying scalars by adding fill_gpu ( #1402 )
...
* fix copying scalars by adding fill_gpu
* Another copy scalar changed to fill
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-09-09 15:54:08 -07:00
Awni Hannun
3ae6aabe9f
throw for certain cases of non captured inputs in compile ( #1401 )
2024-09-09 14:54:31 -07:00
xnorai
dc627dcb5e
Replace the use of result_of_t with invoke_result_t ( #1397 )
...
* Fix C++20 incompatibility
* Fix C++20 incompatibility
2024-09-06 19:52:57 -07:00
Max-Heinrich Laves
efeb9c0f02
Transposed Convolution ( #1245 )
...
* initial implementation for conv_transpose
ran pre-commit
implemented conv_transpose
updated conv_general docstring
updated conv_general docstring
updated code comments
removed commented run_conv_checks
updated acknowledgments
added missing entry to ops.rst
added op to nn.layers
resolved merge conflicts
* removed ConvolutionTranspose primitive as suggested by reviewer
removed ConvolutionTranspose primitive as suggested by reviewer
* remove transpose flag, add another test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-06 19:52:38 -07:00
Awni Hannun
ba3e913c7a
Simplifications for MLX C ( #1396 )
...
* simplifications for MLX C
* use vectors instead of map
* update examples
2024-09-06 19:16:50 -07:00
Awni Hannun
7cca1727af
Fix slice data size ( #1394 )
...
* fix slice data size and add tests
* fix contiguous flag
* simplify stride and perform copy for non-contiguous arrays
* fix cpu
* comment
2024-09-04 19:10:43 -07:00
Bhargav Yagnik
11371fe251
Test to prevent bugs like #1386 ( #1391 )
...
* updated test_array for missing ops
* formatting changes
2024-09-04 17:24:30 -07:00
Awni Hannun
41c603d48a
fix jit reduce ( #1395 )
2024-09-04 14:03:10 -07:00
Angelos Katharopoulos
969337345f
Fix reduce edge case ( #1389 )
2024-09-01 21:37:51 -07:00
Awni Hannun
9592766939
add std as method ( #1387 )
...
* add std as method
* add std as method
2024-09-01 19:49:16 -07:00
Angelos Katharopoulos
58dca7d846
Fix copy in the sort primitive ( #1383 )
2024-08-31 08:32:14 -07:00
Awni Hannun
0d302cd25b
Fix compiel with byte sized constants ( #1381 )
2024-08-30 17:24:35 -07:00
Alex Barron
da691257ec
Fix overflow in quantize/dequantize ( #1379 )
...
* add 2d indices to prevent overflow
* use nthreads not out size
2024-08-30 13:32:41 -07:00
Angelos Katharopoulos
1600092e92
Patch bump ( #1376 )
2024-08-29 16:54:30 -07:00
Awni Hannun
dba2bd1105
Even Even Faster IO ( #1374 )
...
* even more faster io
* make reader pool static
* make python reader thread safe
* one more optimization
2024-08-29 16:05:40 -07:00
Alex Barron
28be4de7c2
Fix JIT reductions ( #1373 )
2024-08-28 16:39:11 -07:00
Awni Hannun
a6c3b38fba
Async load ( #1372 )
...
* async load
* async load
2024-08-28 14:21:55 -07:00
Awni Hannun
fcb65a3897
Even Faster I/O ( #1369 )
...
* try multithreading for faster IO
* smaller batch size
* Account for pread returning less than size
* nit
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-08-28 11:49:07 -07:00
Saanidhya
4e22a1dffe
In continuation to PR1243 to solve issue #1240 ( #1365 )
...
* Solves issue #1240
* Correction
* Update python/mlx/utils.py
* Update python/mlx/utils.py
---------
Co-authored-by: Awni Hannun <awni@apple.com >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-08-28 11:40:41 -07:00
Awni Hannun
291cf40aca
Some fixes to typing ( #1371 )
...
* some fixes to typing
* fix module reference
* comment
2024-08-28 11:16:19 -07:00
Jeethu Rao
bd47e1f066
Fix neon_fast_exp and add more softmax tests ( #1367 )
2024-08-27 23:42:42 -07:00
Aditya Dhulipala
e6b223df5f
Pinv ( #875 )
2024-08-27 23:06:12 -07:00
Angelos Katharopoulos
e64349bbdd
Make eval just wait if all arrays are scheduled ( #1368 )
2024-08-27 17:01:22 -07:00
Angelos Katharopoulos
cdb59faea6
Adds send/recv ops in distributed ( #1366 )
2024-08-26 23:01:37 -07:00
Alex Barron
1d94ac3f90
Add optional headers to `mx.fast.metal_kernel` ( #1358 )
2024-08-26 21:45:45 -07:00
Awni Hannun
5f7d19d1f5
MPI ops in GPU stream for faster comms ( #1356 )
2024-08-26 15:12:50 -07:00
Awni Hannun
2fdf9eb535
Fix ternary for large arrays ( #1359 )
...
* fix ternary for large arrays
* fix
2024-08-26 11:22:27 -07:00
Awni Hannun
860d3a50d7
fix extension metal library finding ( #1361 )
2024-08-26 09:18:50 -07:00
Alex Barron
d1183821a7
int() and float() for mx.array ( #1360 )
2024-08-25 20:41:44 -07:00
Angelos Katharopoulos
8081df79be
Fix boolean all reduce bug ( #1355 )
2024-08-24 10:09:32 -07:00
Nripesh Niketan
64bec4fad7
Chore: update pre-commit hooks ( #1353 )
...
* Chore: update pre-commit refs
* run pre-commit
2024-08-24 06:46:36 -07:00
Alex Barron
b96e105244
Add grid_sample example to metal_kernel docs ( #1352 )
...
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`
* add grid sample to docs
* zero_outputs -> init_value
* add missing header for linux
2024-08-23 18:24:16 -07:00
Awni Hannun
3b4d5484c7
Bump extension MLX version ( #1350 )
...
* Bump extension MLX version
* fix some docs nits
2024-08-23 12:38:34 -07:00
Alex Barron
684e11c664
patch ( #1347 )
2024-08-23 10:42:02 -07:00
Angelos Katharopoulos
b57a52813b
Further reduction tuning ( #1349 )
...
* More reduction tuning
* Forgotten pdb
* Small column long row specialization
2024-08-23 10:35:25 -07:00
Alex Barron
da8deb2b62
fix bug with multiple attributes ( #1348 )
...
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-08-23 10:06:15 -07:00
Awni Hannun
98b6ce3460
Refactor reductions and fix scatter atomics for large sizes ( #1300 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-08-22 16:03:31 -07:00
Awni Hannun
f9e00efe31
fix nanobind and stub gen in circle ( #1346 )
2024-08-22 14:07:27 -07:00
Alex Barron
0fd2a1f4b0
Custom Metal Kernels from Python ( #1325 )
...
* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only
2024-08-22 13:46:29 -07:00
Awni Hannun
df3233454d
2d gather specialization ( #1339 )
2024-08-22 10:48:24 -07:00
Awni Hannun
82db84b899
bump nanobind + fix extension ( #1344 )
2024-08-21 16:05:07 -07:00
Awni Hannun
8ae751d3da
fix io ( #1343 )
...
* fix io
* fix io
* comment
2024-08-21 13:14:46 -07:00
Awni Hannun
d40e76809f
Fix rope ( #1340 )
...
* add test
* fix rope
* fix test
2024-08-20 17:37:52 -07:00
Awni Hannun
bb1b76d9dc
RoPE with frequencies as optional input ( #1337 )
...
* start rope with freq input
* rope with frequencies
* nits
* fix bug
* fix bug + test
* cleanup
* optional base
2024-08-19 18:30:50 -07:00
Angelos Katharopoulos
9d26441224
Fix contiguity check ( #1336 )
...
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-08-19 16:05:06 -07:00
Awni Hannun
f12f24a77c
fix compiling with space in paths ( #1332 )
2024-08-15 16:39:24 -07:00
Awni Hannun
ae5b5cabfd
Fix optimizer reloading from checkpoint ( #1329 )
...
* fix optimizer reloading from checkpoint
* comment
2024-08-15 07:33:23 -07:00
Awni Hannun
d0630ffe8c
Read arrays from files faster ( #1330 )
...
* read faster
* faster write as well
* set default permission for linux
* comment
2024-08-14 20:09:56 -07:00
Alex Barron
99bb7d3a58
GPU mx.sign for complex64 ( #1326 )
2024-08-14 07:54:53 -07:00
Awni Hannun
63ae767232
fix transformer ( #1327 )
2024-08-13 16:04:26 -07:00
Awni Hannun
eaaea02010
Add isfinite ( #1318 )
...
* isfinite
* remove reduce test since fix is not complete
2024-08-13 14:49:28 -07:00
Bhargav Yagnik
a098bc92e0
Fix: Preserve input dtype in Dropout layer output ( #1323 )
...
* Fix: Preserve input dtype in Dropout layer output
- Modified Dropout implementation to ensure that the output dtype matches the input dtype.
- This resolves the issue #1321
* Update test cases in test_nn.py
- Revised test cases to align with updated dropout code
- Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout,
* updated dropout.py
2024-08-13 11:54:21 -07:00
Awni Hannun
1086dc4db0
patch ( #1320 )
2024-08-12 16:13:33 -07:00
Brian Keene
19fb69e2ed
Add memory_efficient_threshold kwarg to sdpa kernel ( #1319 )
...
Allows opt-in to memory efficient GPU shader at proscribed sequence
length. Otherwise, utilizes aggregate MLX primitives for best latency.
2024-08-12 12:57:09 -07:00
Awni Hannun
9231617eb3
Move to nanobind v2 ( #1316 )
2024-08-08 17:17:46 -07:00
Alex Barron
32668a7317
CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv ( #1307 )
...
* add cholesky inv + tri inv
* always run tri_inv on cpu
* consistent naming
2024-08-08 15:18:02 -07:00
Angelos Katharopoulos
780c197f95
Fix test tolerance and patch bump ( #1315 )
2024-08-08 14:51:09 -07:00
Angelos Katharopoulos
eb8819e91e
Revert variance to be numerically stable ( #1314 )
2024-08-08 13:35:02 -07:00
Awni Hannun
30bbea2f08
Add gemv masked to JIT plus some fixes ( #1310 )
...
* add gemv masked to JIT plus some fixes
* some cleanup
* add utils
* fix
* fix 2
* more cleaning
* fix
* remove unused mps matmul support
* one more nit
* revert
2024-08-07 13:38:07 -07:00
Alex Barron
635ccd9e25
Add "edge" mode to mx.pad ( #1309 )
...
* Add edge padding mode
* fix pad in pooling
* string arg instead of enum
2024-08-06 11:23:10 -07:00
nicolov
8c9f0278b9
Add vmap to scatter ( #1200 )
...
* Add vmap to scatter
* updates
* vmap updates + a few more tests
* bug fix
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-08-05 20:12:27 -07:00
Awni Hannun
58d0e199e1
add bfloat conv for windograd ( #1306 )
...
* add bfloat conv for windograd
* accumulate in fp32
* accumulate in fp32
* accumulate in bf16
2024-08-05 15:51:13 -07:00
Awni Hannun
10b5835501
fix creating array from bf16 tensors in jax / torch ( #1305 )
2024-08-01 16:20:51 -07:00
Awni Hannun
6c8dd307eb
faster group norm ( #1304 )
2024-08-01 12:49:23 -07:00
Awni Hannun
43ffdab172
fix rope and random ( #1301 )
...
* fix rope and random
* comment
2024-07-31 16:18:25 -07:00
Awni Hannun
40b6d67333
Fixes for large arrays with a few ops ( #1299 )
...
* fixes for large arrays with a few ops
* fix bug
* fix all of copy
2024-07-30 17:18:39 -07:00
Alex Barron
c52d1600f0
Fused Affine Quantize/Dequantize ops ( #1282 )
...
* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize
2024-07-29 15:11:38 -07:00
Awni Hannun
aa1d6cadad
Fix docs latex build and nits ( #1297 )
...
* fix docs latex build and nits
* fix stub gen and try to clean up building
2024-07-29 11:44:06 -07:00
Atakan Tekparmak
6e06e3a904
feat: Added "tanh" option to GELU approximation ( #1268 )
2024-07-28 09:07:56 +02:00
Yaroslav
8cfb9fc0b8
Update requirements.txt ( #1291 )
2024-07-26 12:59:52 -07:00
Awni Hannun
7b456fd2c0
Array api ( #1289 )
...
* some updates for numpy 2.0 and array api
* some updates for numpy 2.0 and array api
* fix array api doc
2024-07-26 10:40:49 -07:00
Awni Hannun
e9e53856d2
patch bump ( #1287 )
2024-07-25 11:42:09 -07:00
Anton Belov
5029894662
[Issue #1187 ] Add nan_to_num function initial attempt ( #1247 )
...
* initial attempt, working with wrong types
* not compiling; mx.float16 and mx.bfloat16 tests added
* fix nan to num
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-07-25 09:57:37 -07:00
Awni Hannun
baf9fa5f42
Einsum ( #1269 )
...
* einsum initial
* fix comma break
* sum axis was wrong
* small cleanups
* python binding
* changed bindings to resemble numpy
* remove todo comment
* comment changes
* add count of operands/inputs
* fail fast if operands list is empty
* ignore comma if no output
* einsum path matching numpy
* getting somewhere with path
* remove print
* it passes the first test
* moved einsum tests to seperate file
* seperated einsum path
* moved einsum naive
* remove space from equation
* fast fail if no operands passed
* update tests and remove printf
* small cleanup
* some more cleanups
* removed python helper file
* ack
* utilize std for finding min in vector
* duplicate def
* remove the tuple as it was unreadable
* moved einsum_naive back to ops
* remaining isn't needed
* avoid creating another set
* cleanup
* greedy path, start of naive einsum
* more einsum
* fix some bugs
* some more fixes, tests pass
* benchmark
* some simplify
* fix einsum and test
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
* add a bunch more tests and fix a bunch more bugs
* some docs nits
---------
Co-authored-by: dc-dc-dc <dgcruz983@gmail.com >
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-07-25 09:36:44 -07:00
Jagrit Digani
7f914365fd
Fix GPU sort for large arrays ( #1285 )
...
* Fix GPU sort for large arrays
2024-07-24 14:37:10 -07:00
Paul Paczuski
ebd7135b50
Improve stability of BCE loss calculation for input probabilities close to or exactly 0 or 1 ( #1280 )
...
* Improve stability of BCE loss calculation
* Standardize comment
* Apply formatting with black via pre-commit
* Add usage recommendation to docstring
* Update python/mlx/nn/losses.py
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-07-24 08:38:22 -07:00
fgranqvist
50eff6a10a
Implement sampling from laplace distribution. ( #1279 )
2024-07-24 15:15:37 +02:00
Alex Barron
c34a5ae7f7
Fix bfloat16 Hadamard ( #1283 )
...
* fix bfloat16 hadamard
* add scale
* review comments
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-07-23 14:54:43 -07:00
Awni Hannun
e2aa6ec8ae
some fixes ( #1281 )
2024-07-23 11:49:05 -07:00
toji
6768c6a54a
Adding missing type hints ( #1243 )
...
* added type hints for `run`, `tree_map` and `tree_map_with_path`
* fix lint
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-07-23 07:29:38 -07:00
Tim Gymnich
6307d166eb
Fix overflow / underflow handling for expm1f ( #1278 )
...
* Fix overflow / underflow handling for expm1f
* update tests
2024-07-23 07:29:06 -07:00
Awni Hannun
1fba87b0df
Fix leak with multi-output primitives ( #1274 )
...
* fix leak with multi-output primitives
* hopefully an actual fix
2024-07-23 06:34:18 -07:00
Awni Hannun
df124e018a
fix gguf ( #1273 )
...
* fix gguf
* comment
2024-07-18 07:35:35 -07:00
Cheng
2f83d6e4b7
Do not release buffers on exit ( #1142 )
2024-07-15 15:12:24 -07:00
Feng Shijie
987785d8d7
Fix typo and missing header ( #1266 )
2024-07-15 08:20:24 -07:00
Awni Hannun
8c01a7893b
minor fix in optimizer + docs ( #1264 )
2024-07-12 12:18:02 -07:00
Awni Hannun
218047c75a
docs fixes ( #1263 )
2024-07-11 15:59:07 -07:00
Alex Barron
d0da74209b
version bump ( #1260 )
2024-07-11 11:17:55 -07:00
Angelos Katharopoulos
5c1fa64fb0
Custom transforms ( #1246 )
2024-07-10 18:00:01 -07:00
Alex Barron
a3c287354f
Fast Hadamard Transform ( #1249 )
...
* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent
2024-07-09 20:39:01 -07:00
Angelos Katharopoulos
03cf033f82
Fix reshape copy bug ( #1253 )
2024-07-07 21:37:00 -07:00
Alex Barron
bdb36c9a63
add zero vjps for bitwise ops and gather w.r.t. index ( #1256 )
2024-07-07 21:34:59 -07:00
Awni Hannun
20bb301195
CPU binary reduction + Nits ( #1242 )
...
* very minor nits
* reduce binary
* fix test
2024-06-28 13:50:42 -07:00
Awni Hannun
d6383a1c6a
version bump ( #1239 )
2024-06-27 10:43:13 -07:00
Angelos Katharopoulos
b05bcfd27f
Fixes segfault when compiling checkpointed functions ( #1235 )
2024-06-26 16:14:45 -07:00
Alex Barron
2615660e62
Fix strided sort bug ( #1236 )
...
* Use output strides in sort kernel
* fix zero strides bug
2024-06-26 14:32:11 -07:00
Awni Hannun
5b0af4cdb1
fix donation condition for compilation ( #1237 )
2024-06-26 09:04:05 -07:00
Jagrit Digani
8c2e15e6c8
Accelerate import updates for iOS ( #1227 )
...
* Update veclib and bnns includes to #include <Accelerate/Accelerate.h> for compatibility with ios
* Mark float literals in softmax.cpp to be float16_t for errors in ios
* Add arm neon vector operation guards
* Redirect to common backend for consistency
2024-06-26 09:01:50 -07:00
Awni Hannun
56c8a33439
Get metal version from xcode ( #1228 )
...
* get metal version from xcode
* typo
* fix
2024-06-26 07:02:11 -07:00
David Koski
4eef1e8a3e
fix typo ( #1215 )
2024-06-24 13:36:35 -07:00
Alex Barron
95d11bda06
Fix NumPy 2.0 pickle test ( #1221 )
...
* fix numpy version <2 temporarily
* typo
* better fix
* Fix just for bfloat16
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-06-23 05:47:22 -07:00
Awni Hannun
af9079cc1f
version bump ( #1212 )
2024-06-14 11:28:51 -07:00
Jagrit Digani
2d6cd47713
Masked gemv ( #1211 )
2024-06-14 09:52:26 -07:00
Awni Hannun
fe3167d7ea
smaller CPU binary ( #1203 )
...
* smaller CPU binary
* fix no cpu build
2024-06-14 09:46:55 -07:00
Awni Hannun
31e134be35
Build for macOS 15 ( #1208 )
...
* Build for macos 15
* metal32 as well
* comment
---------
Co-authored-by: Awni Hannun <Awni Hannun>
2024-06-13 13:31:44 -07:00
Awni Hannun
e84ba8056d
only allow openmpi ( #1209 )
2024-06-13 12:14:44 -07:00
Fangjun Kuang
f20e97b092
minor fixes ( #1194 )
...
* minor fixes
* fix build errors
2024-06-12 22:06:49 -07:00
Alex Barron
934683088e
Refactor JIT for unary/binary/ternary ops ( #1206 )
...
* refactor unary/binary/ternary ops
* get_primitive_string util
---------
2024-06-12 14:22:12 -07:00
Awni Hannun
de2b9e7d0a
Fix kernel deps to reduce build times ( #1205 )
2024-06-12 11:17:39 -07:00
Alex Barron
dd7d8e5e29
Add Quantized Ops to the JIT ( #1204 )
...
* JIT for quantized ops
* remove unused imports
* address comments
* fix imports
* second attempt to fix imports
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-06-12 09:47:12 -07:00
Awni Hannun
df964132fb
fix scatter + test ( #1202 )
...
* fix scatter + test
* fix test warnings
* fix metal validation
2024-06-11 14:35:12 -07:00
Awni Hannun
709ccc6800
install mpi for release build ( #1199 )
2024-06-10 10:09:32 -07:00
Awni Hannun
cf236fc390
version ( #1191 )
2024-06-06 17:16:40 -07:00
Alex Barron
27d70c7d9d
Feature complete Metal FFT ( #1102 )
...
* feature complete metal fft
* fix contiguity bug
* jit fft
* simplify rader/bluestein constant computation
* remove kernel/utils.h dep
* remove bf16.h dep
* format
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-06-06 12:57:25 -07:00
nicolov
0e585b4409
Add docstring for scatter ( #1189 )
...
* Add docstring for scatter
* docs nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-06-06 11:51:25 -07:00
Angelos Katharopoulos
0163a8e57a
Add docs for the distributed namespace ( #1184 )
2024-06-06 11:37:00 -07:00
Awni Hannun
578842954c
fix jit scan when output doesn't have primitive ( #1190 )
2024-06-06 07:24:58 -07:00
Awni Hannun
496315fe1d
Fix scan ( #1188 )
...
* fix scan
* improve grid size
* fix cpu cummax
2024-06-05 14:21:58 -07:00
Angelos Katharopoulos
0fe6895893
Fix the hard-shrink test ( #1185 )
2024-06-04 16:22:56 -07:00
Nikhil Mehta
0b7d71fd2f
Add softmin, hardshrink, hardtanh ( #1180 )
...
---------
Co-authored-by: Nikhil Mehta <nikmehta@tesla.com >
2024-06-04 15:48:18 -07:00
Awni Hannun
83b11bc58d
Fix Metal API validation for empty concat ( #1183 )
2024-06-04 13:17:08 -07:00
Alex Barron
375a8bbdcc
Add some internal GPU apis ( #1177 )
...
* Add unary/binary/ternay/slice/concat internal GPU ops
* add pad internal op
* formatting + no_cpu fix
2024-06-04 09:24:26 -07:00
Awni Hannun
ea9090bbc4
Add view op ( #1179 )
...
* add view primitive
* nit
* fix view
2024-06-04 08:05:27 -07:00
nicolov
81def6ac76
Fix benchmark ( #1175 )
2024-06-04 07:50:46 -07:00
Angelos Katharopoulos
3de8ce3f3c
In place all-reduce and forgiving init ( #1178 )
2024-06-03 16:47:47 -07:00
Alex Barron
4d485fca24
Add defines include ( #1176 )
...
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-06-03 09:50:10 -07:00
Brian Keene
1865299a30
Metal shaders for memory efficient self attention on large sequences ( #964 )
...
* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00
Dominik Schlösser
3576b547c5
Doc error for default for scale in SinusoidalPositionalEncoding ( #1174 )
2024-06-02 13:42:45 -07:00
Awni Hannun
079882495d
version bump ( #1172 )
2024-05-31 12:29:12 -07:00
K Venkat Ramnan
ab977109db
feat: Added dlpack device ( #1165 )
...
* feat: Added dlpack device
* feat: Added device_id to dlpack device
* feat: Added device_id to dlpack device
* doc: updated conversion docs
* doc: updated numpy.rst dlpack information
* doc: updated numpy.rst dlpack information
* Update docs/src/usage/numpy.rst
* Update docs/src/usage/numpy.rst
---------
Co-authored-by: Venkat Ramnan Kalyanakumar <venkatramnankalyanakumar@Venkats-MacBook-Air.local >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-05-31 12:29:01 -07:00
Awni Hannun
fd1c08137b
stable cumprod grad at 0 ( #1167 )
2024-05-31 12:28:42 -07:00
Jagrit Digani
76b6cece46
Fix multi-block sort stride management ( #1169 )
...
* Fix multi-block sort stride management
* Add seed to tests
2024-05-31 11:10:54 -07:00
Jagrit Digani
9f0df51f8d
Fix matvec vector stride bug ( #1168 )
2024-05-29 12:18:28 -07:00
Awni Hannun
e7a2a3dcd1
Fix a couple bugs ( #1161 )
...
* fix jit reduce for RMS norm
* make strides a single buffer
* better eval error message
* fix compiling with inf and bf16
* fix cpu compile with bf16
2024-05-28 15:18:18 -07:00
Awni Hannun
a87ef5bfc1
fix broadcast bug in bitwise ops ( #1157 )
2024-05-24 11:44:40 -07:00
Awni Hannun
9f9cb7a2ef
version bump ( #1154 )
2024-05-23 18:08:08 -07:00
Awni Hannun
7e26fd8032
Option to JIT steel gemm / conv ( #1139 )
2024-05-23 18:07:34 -07:00
Jagrit Digani
eab2685c67
Float mask update ( #1152 )
...
* Float mask update
* Update CPU impl
2024-05-23 17:20:44 -07:00
Angelos Katharopoulos
50dfb664db
Comms ( #1097 )
...
* Start the communications branch using MPI
* Add ops and primitives
* Add python bindings for distributed
2024-05-23 17:04:02 -07:00
Awni Hannun
0189ab6ab6
More jitting ( #1132 )
...
* docs + circle min size build
* jit scan, arange, softmax
* add sort
* jit reductions
* remove print
* fix deps
* clean includes / nits
2024-05-23 16:23:44 -07:00
Rifur13
9401507336
Add groups to 2-D convolutions ( #1129 )
...
* Added groups to 2-D convolutions. Only implemented for **some** specializations.
Also fixed 1D grouped convs with different kernel strides and added more tests.
* fix channels condition
2024-05-22 20:01:44 -07:00
Awni Hannun
eb8321d863
list based indexing ( #1150 )
2024-05-22 15:52:05 -07:00
Abe Leininger
79ef49b2c2
add mx.trace ( #1143 ) ( #1147 )
...
* working c++ trace implementation
* updated throw + added overloads
* added python binding for trace function
* pre-commit reformatting
* add trace to docs
* resolve comments
* remove to_stream call
2024-05-22 15:50:27 -07:00
Awni Hannun
e110ca11e2
Fix offset bug for device buffers ( #1151 )
...
* fix bug with large offsets for buffers
* add a test
* remove test as its too big for small machine
2024-05-22 15:50:05 -07:00
Awni Hannun
226748b3e7
JIT compile option for binary minimization ( #1091 )
...
* try cpp 20 for compile
* unary, binary, ternary in jit
* nits
* fix gather/scatter
* fix rebase
* reorg compile
* add ternary to compile
* jit copy
* jit compile flag
* fix build
* use linked function for ternary
* some nits
* docs + circle min size build
* docs + circle min size build
* fix extension
* fix no cpu build
* improve includes
2024-05-22 12:57:13 -07:00
Awni Hannun
d568c7ee36
Rename block sparse ( #1149 )
...
* block_sparse_mm to gather_mm
* rename
* nit
* nit
2024-05-22 07:48:34 -07:00
Awni Hannun
e6fecbb3e1
Some fixes in docs ( #1141 )
...
* fixes in docs
* nit
2024-05-20 11:51:47 -07:00
Angelos Katharopoulos
da83f899bb
Improve qvm speed ( #1140 )
2024-05-20 09:20:44 -07:00
jlwitthuhn
7e5674d8be
Treate 'minimum' differently in cosine decay ( #1138 )
2024-05-20 08:00:48 -07:00
Shixian Sheng
0a558577bf
Update README.md ( #1136 )
2024-05-20 06:16:40 -07:00
Awni Hannun
fb71a82ada
Fix copy bug with many dims ( #1137 )
2024-05-17 21:10:03 -07:00
Awni Hannun
23406c9e9e
Choose the right MLX bf16 for extensions ( #1135 )
...
* default to custom bf
* choose right bf
* fix extensions
* fix circle conf
2024-05-17 15:09:28 -07:00
Luca Arnaboldi
b3ec792380
Implemented Cholesky on CPU ( #1119 )
2024-05-17 12:31:59 -07:00
Awni Hannun
6a9b584f3d
patch bump ( #1131 )
2024-05-16 20:51:33 -07:00
Awni Hannun
81dd33af66
allow conversion to dlpack ( #1120 )
2024-05-16 16:11:37 -07:00
Awni Hannun
8b76571896
Fix extensions ( #1126 )
...
* fix extensions
* title
* enable circle
* fix nanobind tag
* fix bug in doc
* try to fix config
* typo
2024-05-16 15:36:25 -07:00
Angelos Katharopoulos
e78a6518fa
Block sparse qmm ( #1124 )
2024-05-16 15:24:14 -07:00
Awni Hannun
1873ffda01
Detect metal version and propagate correctly for JIT ( #1109 )
...
* detect metal version and propagate correctly for JIT
* remove softmax
* fix versions
2024-05-15 17:42:09 -07:00
Jacket
c417e42116
[Fix] minor typo in default argument for argpartition's "axis" parameter ( #1125 )
...
According to the document, argpartition's axis parameter can be None, but due to a previous typo it can't really accepts a None value.
2024-05-15 15:25:25 -07:00
Jagrit Digani
358e1fd6ab
Fused GEMM ( #1123 )
...
* Basic gemm working
* Update addmm
* Clear out steel_gemm and steel_addmm kernels
* Fuse and clear out gather gemm
* Update objc releases
2024-05-15 10:30:41 -07:00
Awni Hannun
631dfbe673
fix scatter index bug ( #1122 )
2024-05-14 15:04:58 -07:00
Cheng
56a4eaed72
Pass missing stream arg in array.flatten ( #1111 )
2024-05-14 06:50:16 -07:00
Cheng
bf925d9dc7
Move args in conv_general ( #1118 )
...
Also fix a typo that padding_lo is passed as padding_hi.
2024-05-14 06:50:09 -07:00
Cheng
1a7ed5dcb6
Fill vector with constructor instead of fill_n ( #1113 )
2024-05-14 06:28:55 -07:00
Cheng
5be5daa6ef
Use compiled function in Sigmoid module ( #1116 )
2024-05-14 06:25:57 -07:00
Cheng
60cb11764e
Use correct module type in quantized.py ( #1115 )
2024-05-14 06:25:42 -07:00
Cheng
cbd5445ea7
The tile op does not accept None as reps ( #1117 )
2024-05-14 06:25:25 -07:00
Cheng
2c7e9b5158
Add missing docs for some ops ( #1110 )
2024-05-14 06:09:05 -07:00
Mike Drob
2263e4b279
Experiment with medium machines for CI ( #1000 )
2024-05-13 19:40:19 -07:00
Awni Hannun
863039da4c
Allow scatter type exception to be caught by checking in op ( #1077 )
...
* allow exception to be caught in main thread
* only for gpu
* more detailed scatter error
2024-05-13 17:43:53 -07:00
Awni Hannun
7178ac0111
No CPU option for binary minimization ( #1105 )
...
* no cpu build option
* docs
* fix
2024-05-13 16:08:11 -07:00
Ravindra R. Jaju
e7f9710499
Fix typo in a variable name in example code. ( #1104 )
...
* Fix typo in a variable name in example code.
* Rename df2dx2 to d2fdx2 - the appropriate naming for the second derivative
* Update CONTRIBUTING.md - add needed python packages, and a virtual-env hint
* Revert "Fix typo in a variable name in example code."
This reverts commit bc10a17534 .
* Rename df2dx2 to d2fdx2
2024-05-13 06:04:23 -07:00
Max-Heinrich Laves
ff4223904d
Conv3d ( #993 )
...
* added conv3d
added conv3d
implemented explicit_gemm_conv_ND_cpu and bounds checks for slow_conv_3D
* incorporated reviewer comments
* fixed test
* reduced tensor shapes in test for conv3d
* Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Reviewer suggestion
2024-05-11 06:15:02 -07:00
Awni Hannun
a9f80d60f6
improve error messaging in eval ( #1101 )
2024-05-10 10:04:07 -07:00
Alex Barron
2e158cf6d0
Add conjugate operator ( #1100 )
...
* cpu and gpu impl
* add mx.conj and array.conj()
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
2024-05-10 07:22:20 -07:00
Awni Hannun
8bd6bfa4b5
version ( #1099 )
2024-05-09 17:52:39 -07:00
Awni Hannun
8b1906abd0
Add compiler flags to disable safetensors and gguf ( #1098 )
...
* with docs
* nit
2024-05-09 17:39:44 -07:00
Awni Hannun
06375e6605
Split encoders in non-concurrent context with a max ops per encoder ( #1085 )
...
* split encoders
* fix race
2024-05-09 16:21:02 -07:00
Awni Hannun
b21242faf1
Allow unary ops to accept array like ( #1093 )
2024-05-09 09:36:02 -07:00
Rahul Yedida
cc05a281c4
Added ArcTan2 operation ( #1079 )
...
* Added ArcTan2 operation
* Cleanup, bug fixes from code review
* Minor cleanup, fixed Linux tests
2024-05-08 08:35:15 -07:00
Jagrit Digani
fe96ceee66
Update block offset adjustment to be in size_t ( #1087 )
2024-05-08 08:10:23 -07:00
Awni Hannun
9814a2ae12
fix conversion to array ( #1070 )
2024-05-06 16:02:49 -07:00
Shubham
6992498e7a
add keyword positonal ( #1081 )
2024-05-06 07:18:49 -07:00
Awni Hannun
21623156a3
Reset peak memory ( #1074 )
...
* reset peak memory
* fix linux
* nits in docs
2024-05-03 17:12:51 -07:00
Nripesh Niketan
79c859e2e0
feat: implement clip_grad_norm ( #1043 )
...
* feat: implement `clip_grad_norm`
* pre-commit
* Add test for clip_grad_norm function in test_optimizers.py
* small fixes
* fix
* lint
* Update tree_reduce
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Refactor clip_grad_norm function to include documentation and improve readability
* format docstring
* Add acknowlegements
* text wrap
* pre-commit
* nits in docs
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com >
2024-05-03 09:07:02 -07:00
Awni Hannun
b00ac960b4
change initial memory limits and add memory size to device info ( #1064 )
2024-05-03 06:50:15 -07:00
Awni Hannun
02a9fc7bfa
Patch bump ( #1067 )
...
* version
* use 0.12.2
2024-05-02 16:37:31 -07:00
Jagrit Digani
f390957685
Block sparse mm ( #1058 )
2024-05-02 14:03:58 -07:00
Angelos Katharopoulos
17f57df797
Improvements in the quantizer and dequantization kernel ( #1061 )
2024-05-01 18:19:11 -07:00
Awni Hannun
7f7b9662ea
Fix leak for multi-output primitives which are never detached ( #1059 )
...
* fix multi output leak
* ignore arrays that will be detached
* add some comments
* stray print
2024-05-01 07:31:45 -07:00
Awni Hannun
19bef39f5c
Add a mx.metal.device_info ( #1060 )
...
* device inof
* add variant
* fix linux
* fix doc
2024-04-30 15:47:27 -07:00
Nripesh Niketan
a30e7ed2da
feat: metal formatting and pre-commit bump ( #1038 )
...
* feat: metal formatting and pre-commit bump
* add guards
* update
* more guards
* more guards
* smakk fix
* Refactor instantiation of ternary types in ternary.metal
* fix scan.metal
2024-04-30 07:18:09 -07:00
Angelos Katharopoulos
8db7161c94
Bug fix in quantize ( #1054 )
2024-04-29 20:55:04 -07:00
Awni Hannun
09f1777896
fix slice update indexing ( #1053 )
2024-04-29 12:17:40 -07:00
Jacket
490c0c4fdc
[Fix] expand axes for dimension with integer indices in mlx_slice_update ( #1035 )
...
* Not sure if this is correct
* Format
* Edit tests
* Add negative test
* Format
* add one more test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-04-29 07:57:28 -07:00
Rifur13
c4a471c99d
Add groups to Conv1d ( #948 )
...
* Add conv1d grouped convs on CPU
* Add GPU support
* Parallelize inside metal kernel
* clenaup
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* New unfold kernel + remove unused code
* Remove copy and refactor
* Update vjp and reuse steel gemm
* Fixed groups on cpu
* Fix metal validation
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-04-27 06:24:57 -07:00
Awni Hannun
86f495985b
Add bitwise ops ( #1037 )
...
* bitwise ops
* fix tests
2024-04-26 22:03:42 -07:00
Awni Hannun
67d1894759
fix order device -> scheduler ( #1039 )
2024-04-26 13:46:41 -07:00
Awni Hannun
5bfe89bdb1
Cpp docs ( #1036 )
...
* start of C++ docs
* fix stream doc
* only include ops for now
2024-04-26 12:56:05 -07:00
Angelos Katharopoulos
82463e9938
Bump the version to 0.12 ( #1034 )
2024-04-25 14:18:08 -07:00
Awni Hannun
771575d27b
Expose function to clear memory cache ( #1032 )
...
* expose function to clear memory cache
* fix linux build
* fix metal tests
2024-04-24 16:48:51 -07:00
Angelos Katharopoulos
20a01bbd9f
Simplifying and improving qmm ( #1030 )
2024-04-24 13:07:45 -07:00
Angelos Katharopoulos
ec8578d41a
Fix quantization of all 0s ( #1028 )
2024-04-24 00:40:42 -07:00
Aneesh Shetty
d0dbfe0b97
Adds radians and degrees ( #1011 )
2024-04-22 11:17:49 -07:00
Awni Hannun
3d405fb3b1
Add synchronize function ( #1006 )
...
* add synchronize function
* fix linux
* fix linux
* fix and fix docs
* fix test
* try synchronize in stream destroy
* synchronize works for both cpu and gpu
2024-04-22 08:25:46 -07:00
Angelos Katharopoulos
b0012cdd0f
Bump the patch version for the quants ( #1018 )
2024-04-19 20:28:34 -07:00
Angelos Katharopoulos
84d61d27aa
Make sure 0 is represented in the quantization ( #1016 )
2024-04-19 19:47:26 -07:00
Awni Hannun
ed83908931
fix gguf loading quants ( #1014 )
...
* fix gguf loading quants
* fix nanobind install
* actual fix
2024-04-19 12:24:07 -07:00
Angelos Katharopoulos
ef5f7d1aea
Fix buffer protocol buffer size designation ( #1010 )
2024-04-19 06:06:13 -07:00
Awni Hannun
090ff659dc
bump ( #1007 )
2024-04-18 13:18:43 -07:00
Jagrit Digani
85c8a91a27
Fix mask broadcasting bug and add relevant test ( #1003 )
2024-04-17 17:33:48 -07:00
Piotr Rybiec
581b699ac9
avgpool, not maxpool ( #1002 )
2024-04-17 08:26:22 -07:00
Awni Hannun
8a0677d56d
Shared events for synchronization + async eval ( #998 )
...
* more async eval
* fix rebase
* try correct async eval
* fix async
* more tests for async eval
* use shared events for synchronization
* comment + cleanup
* with autorelease pool
* fix no metal build
* fix compile
* fix patch
* don't eval if asyn evale'd
* don't use is_evaled
* comments
* more multi stream tests
* try and cleanup use of is_evaled
* use a status flag
2024-04-17 06:16:02 -07:00
Jagrit Digani
b18468bf81
Masked mm ( #978 )
...
* Add block masked matmul op and primitive
2024-04-16 14:45:39 -07:00
Shiyu
107ba2891a
gelu tanh approx ( #989 )
...
* gelu tanh approx
* gelu tanh approx
* replace gelu approx with tanh approach
* fix comments
* fix comment
2024-04-15 19:49:00 -07:00
Awni Hannun
cd9e184529
Quantize embedding ( #994 )
...
* quantize embedding
* rename as_linear + comment
* consistency in docs
* fix test
2024-04-15 16:42:10 -07:00
Alex Barron
2e7c02d5cd
Metal FFT for powers of 2 up to 2048 ( #915 )
...
* add Metal FFT for powers of 2
* skip GPU test on linux
* fix contiguity bug
* address comments
* Update mlx/backend/metal/fft.cpp
* Update mlx/backend/metal/fft.cpp
* fix bug in synch
---------
Co-authored-by: Alex Barron <abarron22@apple.com >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com >
2024-04-11 21:40:06 -07:00
Awni Hannun
ae18326533
No copy command encoder ( #986 )
...
* no copy command encoder
* up layer norm test tolerances
2024-04-11 21:15:36 -07:00
Alex Shepard
91eba8e485
fix for grammatical typo in docs ( #988 )
...
thanks for mlx!
2024-04-11 17:02:06 -07:00
Awni Hannun
d07e295c62
bumpity bump ( #987 )
2024-04-11 12:48:52 -07:00
Angelos Katharopoulos
dce4bd74a4
Add ArrayDesc destructor to avoid possible stack overflow ( #982 )
2024-04-11 11:37:02 -07:00
Nripesh Niketan
ffff671273
Update pre-commit hooks ( #984 )
2024-04-11 07:27:53 -07:00
Awni Hannun
12d4507ee3
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
Awni Hannun
8580d997ff
Try a stack-based DFS for eval ( #980 )
...
* rebase
* nit
* fix eval in vmap
2024-04-10 17:05:13 -07:00
Shiyu
061cf9a4ce
Upsample with bicubic interpolation ( #967 )
2024-04-10 15:47:22 -07:00
Awni Hannun
99abb9eff4
Async eval ( #972 )
2024-04-09 18:34:00 -07:00
Luca Arnaboldi
fffe072028
Implementation of mlx.random.multivariate_normal ( #502 ) ( #877 )
...
* Implementation of mlx.random.multivariate_normal (#502 )
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Updated typo in docstring
* Restricted multivariate_normal to float32
* Generic mean and variance shapes
* Review edits
* Update mlx/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Test for ndim of mean and cov
* nits
* smaller size for test
* fix broadcasted sampling
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: Awni Hannun <awni@apple.com >
2024-04-09 13:50:12 -07:00
Abe Leininger
a1a31eed27
Add mx.meshgrid ( #961 )
2024-04-09 11:43:08 -07:00
Awni Hannun
ae812350f9
use string ( #976 )
2024-04-09 11:22:00 -07:00
Awni Hannun
b63ef10a7f
Extensions ( #962 )
...
* start to fix extensions
* mostly fixed extensions
* fix extension build
* couple more nits
2024-04-09 08:50:36 -07:00
Awni Hannun
42afe27e12
std and expm1 ( #973 )
...
* std and expm1
* actually add expm1
* fix linux
* fix vjp
* relax tol for linux test
* Add it to the compilable primitives
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-04-08 14:26:01 -07:00
Awni Hannun
76e63212ff
Enable bfloat scan ( #974 )
...
* enable bfloat scan
* fix tests
2024-04-08 12:29:19 -07:00
Awni Hannun
aac2f9fb61
Improve profiling with gpu tracing ( #969 )
...
* improve profiling with gpu tracing
* fix for linux
* nit
* doc fix
* fix example
2024-04-07 21:47:43 -07:00
Awni Hannun
bddf23f175
patch bump ( #956 )
2024-04-04 11:56:37 -07:00
Awni Hannun
039da779d1
No quant reshape ( #957 )
...
* precise option on cpu
* remove print
* remove reshape in quant matmul
* no quant reshape
2024-04-04 11:52:12 -07:00
Awni Hannun
d88d2124b5
segfaut layer norm grad ( #955 )
2024-04-04 10:59:15 -07:00
Awni Hannun
e142aaf8a1
Option for precise softmax ( #953 )
...
* precise softmax
* Add an equivalency check
* Make the threadgroup memory definition fixed
* precise cpu softmax
* precise option on cpu
* remove print
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-04-04 08:32:35 -07:00
AmirHossein_Razlighi
0caf35f4b8
Better exceptions in case of invalid operations on mlx.core.array ( #910 ) ( #926 )
...
* Nicer exceptions for ops on non-arrays
2024-04-02 21:11:24 -07:00
Angelos Katharopoulos
3fc993f82d
Properly handle negative axes in python vmap ( #944 )
2024-04-02 18:07:23 -07:00
Awni Hannun
741eb28443
fix a couple bugs ( #952 )
2024-04-02 12:07:41 -07:00
Angelos Katharopoulos
1a87dc5ea8
Fix compile fusion for multi-output edge cases ( #950 )
...
* Fix compile fusion for multi-output edge cases
* Add a test for multi-output compile
2024-04-02 08:42:31 -07:00
Awni Hannun
2427fa171e
Fix cpu compile ( #934 )
...
* fix one cpu bug, test for another
* format hooks
* simplify contiguity check for cpu compile
* fix
* add back donation
* comment
2024-04-01 17:37:12 -07:00
Jagrit Digani
639e06e1f3
Indexing bug fix ( #947 )
...
* Fix axes accounting
* Add tests
2024-04-01 12:18:50 -07:00
Angelos Katharopoulos
02fedbf1da
Fix array initialization from list ( #942 )
...
* Fix array initialization from list
* Change the error message in the test
2024-04-01 06:27:52 -07:00
Angelos Katharopoulos
110d9b149d
Layer norm grad fix donation bug ( #941 )
...
* add layer norm grad test
* Fix donation bug in layernorm vjp
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-04-01 06:15:50 -07:00
Angelos Katharopoulos
9cbff5ec1d
Fix typo in qmm check ( #940 )
2024-03-31 19:15:44 -07:00
Suvan Kumar
433c0206b0
Update saving_and_loading.rst ( #929 )
...
Update saving / load docs.
2024-03-30 14:30:06 -07:00
Awni Hannun
8915901966
Donation bug ( #933 )
...
* donation
* buf
* fix bug in softmax
* comment
* remove print
2024-03-30 10:08:54 -07:00
AmirHossein_Razlighi
f48bc496c7
Comparing python objects (such as list/tuple) with mlx.core.array ( #920 )
...
* add implicit conversion of list to array for equality constraint
* add tests for array equality
* add test for tuple and array equality
* return False if __eq__ arg is list or tuple
* write tests for equality
* update the rule of comparison for __ge__/__gt__/__lt__/__le__
* add a helper function for detecting mlx.core.array
* return true in case fo inequality
* debug minor issue regarding detecting mlx array
* add tests for inequality comparisons
* add name for contribution
* reformat files using pre-commit
* update tests for float
* update tests for inequality
* raise exception in case of invalid comparisons
* use isinstance instead of string comparison
* replace "is_convirtable_to_array" with previous logic
* remove throwing exceptions for other operations
* just a comment
* minor changes for efficiency
* optimize a utils function
* change the function name
* Update ACKNOWLEDGMENTS.md
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-03-29 06:52:30 -07:00
Cheng
913b19329c
Add missing && when forwarding args ( #925 )
...
Without the && args would be copied and perfect forwarding won't work.
2024-03-29 06:48:29 -07:00
Awni Hannun
d8cb3128f6
bump ( #924 )
...
* bump
* fix version
2024-03-28 16:14:55 -07:00
Angelos Katharopoulos
5f9ba3019f
Fix qmm_t for unaligned cases ( #923 )
2024-03-28 15:34:57 -07:00
Cheng
46caf0bef0
Remove unnecessary string copies ( #891 )
...
1. Use string_view instead of string when there is no need for copy.
2. Otherwise move string when possible.
2024-03-28 13:14:59 -07:00
Jack Mousseau
45f636e759
Add Metal debug option and capture functions ( #707 )
...
* Add Metal debug option and capture functions
* Add brief Metal debugger documentation
* doc nits
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-28 09:40:31 -07:00
Cheng
a7b404ff53
Use uintptr_t instead of size_t to store funtion id ( #916 )
...
Also does some small cleanup of the compile cache code.
2024-03-28 06:37:59 -07:00
Angelos Katharopoulos
c4fd0e5ede
Fixes #918 bug in compile_tests ( #919 )
2024-03-27 22:37:37 -07:00
Cheng
bab5386306
Make ops aware of rvalues: astype/as_strided/copy/full ( #895 )
...
When compositing transforms lots of temporary of arrays will be created
and passed to next primitive, and by making ops accepting args by value
we can avoid lots of copies of temporary arrays.
2024-03-27 22:35:55 -07:00
Angelos Katharopoulos
aca7584635
Fix OOB read in qmv when non-divisible by blocksize ( #917 )
2024-03-27 22:18:35 -07:00
AmirHossein_Razlighi
d611251502
Support Chaining for some of functionalities of nn.Module ( #885 ) ( #897 )
...
* add chaining support for some of the functionalities of "nn.Module"
* reformat
* change the return types
* remove return types
* add return type with forward referencing
* add tests for chaining
* add name to contributors
* Update python/mlx/nn/layers/base.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/mlx/nn/layers/base.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* update docstring
* update docstrings
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-03-27 19:58:29 -07:00
Cheng
f30b659291
Make MLX build on x64 macOS ( #901 )
...
The arm64 macbook pros are heavy and I usually care my intel one for
mobile, it would be nice if I can play with MLX on it.
To build with x64, user must pass `MLX_ENABLE_X64_MAC` to cmake:
CMAKE_ARGS='-DMLX_ENABLE_X64_MAC=ON' python setup.py
2024-03-27 06:14:29 -07:00
Cheng
90dfa43ff1
Don't use make_unique to create shared_ptr ( #902 )
...
The code compiled because shared_ptr's constructor actually accepts
unique_ptr.
2024-03-27 06:13:29 -07:00
Awni Hannun
dc175f08d3
Fix race in multi-stream eval ( #911 )
...
* maybe fix race
* comment
2024-03-26 16:36:36 -07:00
Angelos Katharopoulos
29221fa238
Implement vjps for some primitives in the fast namespace ( #883 )
...
* Implement rope vjp in terms of rope
* RMSNormVJP primitive and kernel
* Add LayerNormVJP primitive and kernel
2024-03-26 16:35:34 -07:00
Cheng
a789685c63
Remove duplicate defines of StreamOrDevice and is_big_endian ( #892 )
2024-03-26 15:15:11 -07:00
Jagrit Digani
240d10699c
Implement negative padding in conv with slicing ( #907 )
...
* Implement negative padding with slicing
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni@apple.com >
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-26 14:59:19 -07:00
Jagrit Digani
925014b661
Fix multiblock sort limits ( #906 )
...
* Fix multiblock sort limits
* Fix metal validation error
2024-03-26 14:00:00 -07:00
Abdussamet Türker
5611e1a95e
Fix unsqueeze with None ( #899 )
...
* Fix unsqueeze with None
* Clean unnecessary files
2024-03-26 13:59:44 -07:00
Awni Hannun
570f2bf29e
pick up preivously set attributes ( #905 )
2024-03-26 11:19:59 -07:00
Angelos Katharopoulos
9948eddf11
Fix nan and improve speed for qvm ( #903 )
2024-03-26 10:41:45 -07:00
Luca Arnaboldi
a3ee03da01
Fixing random.normal for half-precision dtype #642 ( #904 )
...
* Fixing random.normal for half-precision dtype #642
* Update python/tests/test_random.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-03-26 09:58:27 -07:00
Cheng
28fcd2b519
Add missing && when forwarding args ( #894 )
...
Without the && args would be copied and perfect forwarding won't work.
Also add template utils to make sure the function only forwards array
and not vector<array>.
2024-03-25 14:55:54 -07:00
Jack Mousseau
8e686764ac
Ensure shape dimensions are within supported integer range ( #566 ) ( #704 )
...
* Ensure shape dimensions are within supported integer range (#566 )
* fix build
* fix rebase bug
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-25 13:29:45 -07:00
Daniel Strobusch
479051ce1c
add numeric type hierarchy and issubdtype as well as a set_dtype meth… ( #427 )
...
* add numeric type hierarchy and issubdtype as well as a set_dtype method to nn.Module with predicate
numeric type hierarchy and issubtype is compatible to the [numpy hierarchy](220f0ab2c5/numpy/_core/numerictypes.py (L42) ).
Closes #285 .
* nits in docs
* unify type category checking
* nits in docs
* nits in docs
* more docs nits
* fix callable type
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-25 12:32:59 -07:00
Awni Hannun
bfb5bad4f0
patch ( #893 )
2024-03-24 21:03:59 -07:00
Awni Hannun
1e16331d9c
post nanobind docs fixes and some updates ( #889 )
...
* post nanobind docs fixes and some updates
* one more doc nit
* fix for stubs and latex
2024-03-24 15:03:27 -07:00
Awni Hannun
be98f4ab6b
Reduce a little overhead ( #871 )
...
* some small overhead improvements
* use result_type in rms_norm
* remove release force
* fix + use non-vector version
* revert compile change
* fix ops
* a little more overhead
* a little more cleanup and overhead
2024-03-22 17:29:36 -07:00
Angelos Katharopoulos
6ee1112f30
Fix copy donation and add partial rope ( #881 )
2024-03-22 17:28:26 -07:00
Jagrit Digani
8e5a5a1ccd
Set item bug fix ( #879 )
...
* set item shaping bug fix
* Add extra tests
2024-03-22 12:11:17 -07:00
Angelos Katharopoulos
fcda3a0e66
Increase test tolerance for fast.layer_norm ( #880 )
2024-03-22 12:10:27 -07:00
Cheng
9663c22fe9
Do not store iostream in shared_ptr ( #872 )
...
There is no need to store iostream in shared_ptr, doing so adds the cost
of a heap allocation.
2024-03-22 06:54:45 -07:00
Cheng
f0ae00da12
Reduce implicit copies in make_array ( #874 )
...
1. Move shapes into outputs instead of copying them.
2. Pass primitive by const ref as it is always copied into outputs, which
removes a copy when calling make_array.
2024-03-22 06:29:16 -07:00
Awni Hannun
44390bd3d0
Bump ( #869 )
...
* bump
* fix none in a few ops
2024-03-21 13:56:56 -07:00
Angelos Katharopoulos
2225374060
Adds mx.fast.layer_norm ( #870 )
2024-03-21 13:55:51 -07:00
nicolov
105d236889
Add vmap for SVD and inverse ( #849 )
2024-03-21 13:18:27 -07:00
Angelos Katharopoulos
53e6a9367c
Use reshape and transpose for non-overlapping pooling windows ( #867 )
2024-03-21 10:21:03 -07:00
Chime Ogbuji
f5a1582fe8
Add minimum for cosine decay function ( #859 )
...
* Add minimum for cosine decay function
* Update python/mlx/optimizers/schedulers.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-03-21 07:33:29 -07:00
Awni Hannun
a54f06b16f
Fast RMS Norm ( #862 )
...
* fast rmsnorm
* no rms gpu
* kernel
* fix shared mem
* looped rms and donation in softmax
* Make the squaring in float32 to avoid underflow
* Fix the default StreamOrDevice for rope and rms_norm in fast
* nits
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-03-21 07:20:54 -07:00
Cheng
4650d94d98
Add missing && in eval ( #864 )
...
Without the && args would be copied and perfect forwarding won't work.
To avoid eval calling itself recursively, the vector version of eval is
changed to take by value instead, which will save a copy of array when a
rvalue is passed.
2024-03-21 06:15:48 -07:00
Jagrit Digani
a5681ebc52
Update set item ( #861 )
...
* Update mlx_set_item to handle regular slices without expanding
* Refactor ellipsis handling
* Route mlx_set_item to slice_update where possible
* Update mlx_scatter_args_slice
* Don't route to gather if no array indices
2024-03-21 02:48:13 -07:00
Cheng
e849b3424a
Do not use static constexpr in header ( #863 )
...
Doing so results in each compilation unit (.cpp file) having its own
copy of the variable, while inline constexpr makes sure there is only
one copy.
2024-03-20 21:28:05 -07:00
Jagrit Digani
b219d12a6b
Check edge case handling in row reduce med kernel ( #858 )
2024-03-20 11:37:58 -07:00
Jagrit Digani
cec8661113
Add a SliceUpdate op and primitive ( #850 )
...
* Enable copy to work with int64 strides
* Fix uniform buffer indices or copy kernel arguments
* Update utils.h
* Remove manual unrolling of elem to loc loop
* GPU copy updated to handle negative strides
* Add slice update primitive
2024-03-20 10:39:25 -07:00
Cheng
73a8c090e0
Pass shape and inputs by value in array's constructor ( #853 )
...
Since the shape and inputs are always saved as copy in ArrayDesc, we can
unify array's constructors to just take the arguments by value.
There are 2 cases:
1. When shape is a lvalue, it will be copied into array's constructor and
then moved into ArrayDesc's member. So only 1 copy happens.
2. When shape is a rvalue, it will be moved into array's constructor and
then moved into ArrayDesc's member. So no copy happens.
So having 1 constructor that takes by value is equivalent to having 2
constructors that const reference and rvalue separately.
2024-03-20 07:54:30 -07:00
Md. Rasel Mandol
db6796ac61
simple typo fille ( #848 )
2024-03-19 06:15:17 -07:00
Awni Hannun
9a8ee00246
Switch to nanobind ( #839 )
...
* mostly builds
* most tests pass
* fix circle build
* add back buffer protocol
* includes
* fix for py38
* limit to cpu device
* include
* fix stubs
* move signatures for docs
* stubgen + docs fix
* doc for compiled function, comments
2024-03-18 20:12:25 -07:00
Cheng
d39ed54f8e
Some C++ code are not needed ( #841 )
...
1. Anonymous namespace means internal linkage, static keyword is not needed.
2. The default constructor of std::shared_ptr initializes the pointer to
nullptr, you don't need to explicitly set it.
2024-03-18 17:04:10 -07:00
Awni Hannun
16546c70d8
No reshape rope ( #838 )
...
* no reshape rope
* no reshape rope
2024-03-18 17:03:07 -07:00
nicolov
eaba55c9bf
Add matrix inversion primitive ( #822 )
2024-03-15 06:34:36 -07:00
Awni Hannun
19ec023256
vmap matmul and admm ( #836 )
2024-03-14 14:38:22 -07:00
Awni Hannun
63ab0ab580
version ( #835 )
2024-03-14 12:20:40 -07:00
Jagrit Digani
8dfc376c00
Strided reduce specialization for small reductions ( #826 )
...
* Add small column / general reduction specialization
2024-03-14 09:16:53 -07:00
Angelos Katharopoulos
1efee9db09
Add types and order in kernel name ( #831 )
2024-03-13 20:34:06 -07:00
Awni Hannun
43abc402d8
route to fallback ( #828 )
2024-03-13 19:56:04 -07:00
Angelos Katharopoulos
3f8b1668c4
Make reshape faster for row_contiguous cases ( #829 )
2024-03-13 16:22:03 -07:00
Angelos Katharopoulos
76c919b4ec
NumberOfElements for shapeless compile and vmap fixes ( #802 )
2024-03-13 10:34:14 -07:00
Angelos Katharopoulos
29d0c10ee5
Reshape improvement ( #818 )
2024-03-12 17:54:31 -07:00
Jagrit Digani
5ad133f8bb
No copy gems ( #801 )
...
* Enable collapsing batch dims in gemm
* Update gemm to only make copies when neither of the last 2 axes are contiguous
* Update addmm to support gemv shapes
* Update addmm to support irregular batch strides
* Update tests
2024-03-12 13:13:41 -07:00
nicolov
d0c544a868
Add SVD primitive ( #809 )
...
Add SVD op using Accelerate's LAPACK following
https://developer.apple.com/documentation/accelerate/
compressing_an_image_using_linear_algebra
Co-authored-by: Nicolo Valigi <nvaligi@apple.com >
2024-03-12 12:30:11 -07:00
Daniel Falbel
ffb19df3c0
Fix docstring for correctly rendering ( #820 )
2024-03-12 11:46:44 -07:00
Awni Hannun
8b7532b9ab
fix scatter ( #821 )
2024-03-12 11:42:07 -07:00
Awni Hannun
366478c560
fix modules with dict ( #819 )
2024-03-12 08:54:06 -07:00
Justin Deschenaux
8e5600022a
Implement RNN, GRU, LSTM ( #268 )
...
* RNN base implementation
* Address comments+format
* nits in docs
* add tests for prb
* fix test
* add a couple tests
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-03-11 21:14:44 -07:00
Awni Hannun
0e95b64942
Fix bug in tape order during simplify ( #816 )
...
* fix bug in tape order during simplify
* properly fix compile
* last bug
2024-03-11 17:29:05 -07:00
nicolov
0ae22b915b
Remove code duplication in reduce ops ( #793 )
...
* Remove code duplication in reduce ops
* Remove the unnecessary lambda
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-03-11 10:57:07 -07:00
Awni Hannun
7c441600fe
Compile stride bug ( #812 )
...
* fix compile stride bug
* revert sdpa fix
* fix cpu
* fix bug with simplifying outputs
2024-03-11 06:31:31 -07:00
Awni Hannun
a4d290adb9
Remove depth traversal ( #813 )
...
* no depth traversal
* counter outside loop
2024-03-09 20:21:32 -08:00
Awni Hannun
28301807c2
Version bump and os error ( #807 )
2024-03-07 13:57:58 -08:00
Awni Hannun
74ed0974b3
Support 13.0+ with xcode 14.3 ( #806 )
...
* Support 13.0+ with xcode 14.3
* revert revert
2024-03-07 13:27:57 -08:00
Jagrit Digani
ec8a4864fa
Fix SDPA kernel bug on Mac OS 13.3 SDK ( #805 )
...
* Move sdpa kernel to allocate tgp mem statically and allow macOS 13.3 SDK builds
* Style
2024-03-07 10:18:09 -08:00
Awni Hannun
b7588fd5d7
fix inplace to not make a shallow copy ( #804 )
2024-03-07 09:34:11 -08:00
Awni Hannun
f512b905c7
Minimum xcode / sdk ( #800 )
...
* minimum xcode /sdk
* try multiple xcode versions in CI
* update python
* metal validation for python tests
2024-03-07 08:19:43 -08:00
Awni Hannun
afd5274049
route to fallback for bfloat ( #794 )
2024-03-06 15:39:12 -08:00
Awni Hannun
1074674e32
Add a maximum graph depth ( #797 )
...
* add a maximum graph depth
* remember how to use C++
2024-03-06 15:39:00 -08:00
AlexCheema
7762e07fde
Update function_transforms.rst ( #796 )
...
Fix typo in function_transforms.rst
2024-03-06 12:03:37 -08:00
Luca Arnaboldi
cbefd9129e
Implementation of pickle, copy and deepcopy for Python arrays ( #300 & #367 ). ( #713 )
...
* Implemented pickling and copy for Python arrays(#300 & #367 )
* Fixing typos
* Pickle with NumPy arrays
* Pickle: workaround for bfloat16
* Revert "Pickle: workaround for bfloat16"
This reverts commit 25afe6bc09 .
* Added an error when pickling bfloat16
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* clang-format applied
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
2024-03-06 08:02:41 -08:00
Angelos Katharopoulos
e39bebe13e
Fix reshaping of empty arrays ( #791 )
2024-03-05 23:33:22 -08:00
Angelos Katharopoulos
14b4e51a7c
Improved quantized matrix vector product ( #786 )
2024-03-05 17:32:19 -08:00
Awni Hannun
cbcf44a4ca
Some fixes in cache / thread safety ( #777 )
...
* some fixes in cache / thread safety
* speed up no cache case
* fix opt test
* optimizer docs
* otpimizer docs
* fix adafactor
* fix adafactor
2024-03-05 13:30:50 -08:00
Awni Hannun
859ae15a54
Fix test ( #785 )
2024-03-04 23:02:27 -08:00
Brian Keene
0787724c44
Fast Inference SDPA op ( #735 )
...
* Fast Inference SDPA op
Implements metal shaders for:
o = mx.fast_inference_sdpa(queries, keys, values, scale, mask)
Supports fp16, fp32 dtypes; assumes d_k = 128.
Generic op support / prompt encoding supported via mlx primitives.
Metal implementation is for the inference use case only.
Majority of performance benefits appears to results from GQA & reduced
bandwidth requirements; there is approximate performance parity for the
MHA use case (from some measurements on M3 Max).
* Flush shared memory to zero before unprotected reads for (scores @ values)
* Move to fast:: namespace, address reviewer comments
... also attempt to revert formatter auto-change for files not relevant
to this change
* Shared memory flush to top of kernel
* Resolve compiler warnings
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
* Update docstring per PR feedback
* Softmax in higher precision, ...
* route to fallback for more use cases - batch size > 1, head_dim other
than 128, etc.
* Address linux build failure
* Address other reviewer comments
* Remove extraneous eval_cpu function per review
---------
Co-authored-by: Atila Orhon <64497909+atiorh@users.noreply.github.com >
Co-authored-by: Awni Hannun <awni.hannun@gmail.com >
Co-authored-by: atila <atiorh@icloud.com >
2024-03-04 21:06:11 -08:00
Awni Hannun
7b463ffb07
Ios compile ( #784 )
...
* try to fix build for ios
* skip cpu compile
* fix namespace
* fix namespace
* Use CMake for platform specific cpu compile
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-03-04 20:02:26 -08:00
Jagrit Digani
6686e61ca4
Reduce update ( #783 )
...
* Split reduction files to reduce compile times
* Add small and medium axis size specializations for row reductions
* Add non-row-reduction options for small and med kernels
2024-03-04 19:09:51 -08:00