Commit Graph

1002 Commits

Author SHA1 Message Date
Awni Hannun
f599c11bc8
bump (#1931) 2025-03-05 13:16:53 -08:00
Angelos Katharopoulos
0792ff02ff
Only fail when 10 consecutive socket errors occur (#1928) 2025-03-05 13:16:19 -08:00
Alex Barron
fd0d63ba5b
Affine quant always in fp32 (#1925)
* do affine quant in fp32

* static cast
2025-03-04 17:50:19 -08:00
Abe Leininger
3835a428c5
Adds nuclear norm support (#1894)
* adjust norm unit test tolerance
2025-03-04 13:26:02 -08:00
Angelos Katharopoulos
9680f72cca
Add a multi optimizer (#1916) 2025-03-04 13:16:35 -08:00
Angelos Katharopoulos
a0737273d3
Allow debugging in distributed mode (#1920) 2025-03-04 13:01:10 -08:00
Awni Hannun
e613d0eaf0
SDPA support for small batch (over sequence) queries (#1922)
* batch query sdpa

* batch sdpa for query
2025-03-04 10:59:04 -08:00
Awni Hannun
6bcd6bcf70
fix donation in scan (#1917) 2025-03-03 11:30:59 -08:00
Awni Hannun
ba12e4999a
Use a heap for small sizes (#1911)
* use a heap for small sizes

* check if VM
2025-03-03 06:50:57 -08:00
Awni Hannun
4e7cd31d12
Fix slice data size (#1913)
* fix slice data size

* add test
2025-03-02 21:50:42 -08:00
Angelos Katharopoulos
5e6c130d93
RMS norm without scaling (#1915) 2025-02-28 20:26:57 -08:00
Angelos Katharopoulos
5d68082881
Ring docs (#1829) 2025-02-28 11:34:21 -08:00
Angelos Katharopoulos
607181644f
Add mlx.distributed_config script (#1902) 2025-02-28 11:16:39 -08:00
Jagrit Digani
89d327075f
Enabling fused attention for head dim 128 (#1899)
* Share KV smem

* Fix bfloat error

* Unroll O = S @ V loop

* Perf upgrade

* Remove commented out function

* Add -Wno-c++17-extensions flag to metal flags

* Add -Wno-c++17-extensions flag to metal extension flags
2025-02-26 10:02:06 -08:00
Angelos Katharopoulos
6bf00ef631
Fix ring of 2 and allow scalars in API (#1906) 2025-02-25 17:03:01 -08:00
Awni Hannun
7d042f17fe
Double for lapack (#1904)
* double for lapack ops

* add double support for lapack ops
2025-02-25 11:39:36 -08:00
Awni Hannun
28b8079e30
fix double type promotion (#1901) 2025-02-25 06:00:53 -08:00
Awni Hannun
7face5d9fd
fix cpu compile (#1897) 2025-02-24 14:10:30 -08:00
Awni Hannun
a44dc4bdb0
fix leaking objc (#1898) 2025-02-24 13:57:59 -08:00
Awni Hannun
2d0f384b6f
fix simd erf_inv (#1896) 2025-02-24 13:57:47 -08:00
Awni Hannun
8ff84b5c43
fix version and expose command queue getter (#1892) 2025-02-20 15:25:15 -08:00
Angelos Katharopoulos
10b271d963
Ring update (#1885) 2025-02-20 14:32:31 -08:00
Jesper Stemann Andersen
0ebc8a3d25
Fixed issue where Clang on FreeBSD failed to compile mlx/backend/cpu/quantized.cpp (#1890) 2025-02-20 12:02:12 -08:00
Awni Hannun
bbda0fdbdb
Allow non-square lu (#1889) 2025-02-20 08:13:23 -08:00
Jesper Stemann Andersen
c86422bdd4
Added mlx::core::version() returning std::string(MLX_VERSION) (#1819)
* Added version.h providing mlx::core::version() returning std::string(MLX_VERSION)

Also, added MLX_VERSION_MAJOR, MLX_VERSION_MINOR, MLX_VERSION_PATCH, MLX_VERSION_NUMERIC, and accompanying functions.

* Added version.h to mlx.h

* Changed version int functions to be constexpr

* Formatting

* Added handling of MLX_VERSION where only the prefix has major.minor.patch format

* Changed version function to be constexpr
2025-02-19 20:30:19 -08:00
Awni Hannun
c707b2b0a6
Limit compile buffers (#1887)
* limit compile buffers

* maybe not flaky test
2025-02-19 20:28:13 -08:00
Angelos Katharopoulos
78ba24c37d
Raise an exception in the rope op if input is integer (#1884) 2025-02-19 14:43:39 -08:00
Angelos Katharopoulos
1a2cb72030
Ensure linspace always contains start and stop (#1883) 2025-02-19 13:53:20 -08:00
Abe Leininger
344a29506e
Enforce triangular matrix form in tri_inv (#1876)
* fix tri_inv bug

* Revert "fix tri_inv bug"

This reverts commit b74b290201.

* Make sure that tri_inv returns a triangular matrix

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-02-19 12:42:33 -08:00
Angelos Katharopoulos
71de73a668
Fix convs by reverting #1803 (#1882) 2025-02-18 14:36:34 -08:00
Alex Barron
4c1dfa58b7
xor op on arrays (#1875) 2025-02-17 00:24:53 -08:00
Awni Hannun
5274c3c43f
compiler warnings are errors (#1870) 2025-02-17 00:07:49 -08:00
Angelos Katharopoulos
1762793989
Remove unused uniform (#1867) 2025-02-14 15:51:41 -08:00
Awni Hannun
6cec78d8f2
bump (#1866) 2025-02-14 13:09:34 -08:00
Jagrit Digani
2dc307f2e6
Winograd Update for Small batches (#1803)
* Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels
2025-02-14 13:08:13 -08:00
Awni Hannun
7aea5b1895
Allow dynamic ops per buffer based on dispatches and memory (#1864)
* Allow dynamic ops per buffer based on dispatches and memory

* add initial arch values
2025-02-13 19:18:22 -08:00
Ronan Collobert
9733e16496
fix function pointer (#1865) 2025-02-13 18:46:11 -08:00
Alex Barron
7f2d1024f3
add f8_e4m3 loading (#1859) 2025-02-13 17:10:03 -08:00
Awni Hannun
428f589364
Revert "More buffer donation in some cases (#1858)" (#1863)
This reverts commit d274ae77f2.
2025-02-13 14:21:44 -08:00
Alex Barron
5cd97f7ffe
Bitwise Inverse (#1862)
* add bitwise inverse

* add vmap + fix nojit

* inverse -> invert

* add to compile + remove unused
2025-02-13 08:44:14 -08:00
Awni Hannun
e425dc00c0
Faster small batch qmv (#1861)
* faster small batch qmv

* swap batch and block dims for qvm and qmv regular
2025-02-12 22:02:36 -08:00
Awni Hannun
d274ae77f2
More buffer donation in some cases (#1858)
* more donation

* fix

* add test
2025-02-12 19:41:37 -08:00
Alex Barron
55c5ac7820
fix int64 bug (#1860) 2025-02-12 19:23:46 -08:00
Angelos Katharopoulos
0145911bea
Fixes output donation for IO ops on the GPU (#1857) 2025-02-12 10:52:30 -08:00
Awni Hannun
0a5215693e
Fix grad copies (#1854)
* fix grad with copies

* add test

* add test
2025-02-11 15:26:42 -08:00
Awni Hannun
2a45056ba8
Cycle leak break (#1856)
* detect and break leaks in custom function

* detect and break leaks in custom function
2025-02-11 14:45:02 -08:00
Cheng
142b77751d
Fix compilation error on Windows (#1844) 2025-02-10 19:53:05 -08:00
Abe Leininger
a5ededf1c3
CPU LU factorization and linear solvers (#1451)
* linalg solve backend

* nits

* more nits + fix

* luf primitive and lu, solve, and solve_triangular backends

* changes / nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-10 12:32:24 -08:00
Franck Verrot
7df3f792a2
Ensure Conv2D and Conv3D's kernel sizes aren't trimmed (#1852)
Before the change, this snippet:

```
print(nn.Conv1d(1, 32, 3, padding=1))
print(nn.Conv2d(1, 32, (3, 3), padding=1))
print(nn.Conv3d(1, 32, (3, 3, 3), padding=1))
```

would output:

```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3,), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```

After the change, the output will be:

```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```
2025-02-10 06:27:01 -08:00
Angelos Katharopoulos
9eb7d7362f
Fix Split::vmap (#1845) 2025-02-08 09:22:13 -08:00