Awni Hannun
c707b2b0a6
Limit compile buffers ( #1887 )
...
* limit compile buffers
* maybe not flaky test
2025-02-19 20:28:13 -08:00
Angelos Katharopoulos
78ba24c37d
Raise an exception in the rope op if input is integer ( #1884 )
2025-02-19 14:43:39 -08:00
Angelos Katharopoulos
1a2cb72030
Ensure linspace always contains start and stop ( #1883 )
2025-02-19 13:53:20 -08:00
Abe Leininger
344a29506e
Enforce triangular matrix form in tri_inv
( #1876 )
...
* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201
.
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-02-19 12:42:33 -08:00
Angelos Katharopoulos
71de73a668
Fix convs by reverting #1803 ( #1882 )
2025-02-18 14:36:34 -08:00
Alex Barron
4c1dfa58b7
xor op on arrays ( #1875 )
2025-02-17 00:24:53 -08:00
Awni Hannun
5274c3c43f
compiler warnings are errors ( #1870 )
2025-02-17 00:07:49 -08:00
Angelos Katharopoulos
1762793989
Remove unused uniform ( #1867 )
2025-02-14 15:51:41 -08:00
Awni Hannun
6cec78d8f2
bump ( #1866 )
2025-02-14 13:09:34 -08:00
Jagrit Digani
2dc307f2e6
Winograd Update for Small batches ( #1803 )
...
* Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels
2025-02-14 13:08:13 -08:00
Awni Hannun
7aea5b1895
Allow dynamic ops per buffer based on dispatches and memory ( #1864 )
...
* Allow dynamic ops per buffer based on dispatches and memory
* add initial arch values
2025-02-13 19:18:22 -08:00
Ronan Collobert
9733e16496
fix function pointer ( #1865 )
2025-02-13 18:46:11 -08:00
Alex Barron
7f2d1024f3
add f8_e4m3 loading ( #1859 )
2025-02-13 17:10:03 -08:00
Awni Hannun
428f589364
Revert "More buffer donation in some cases ( #1858 )" ( #1863 )
...
This reverts commit d274ae77f2
.
2025-02-13 14:21:44 -08:00
Alex Barron
5cd97f7ffe
Bitwise Inverse ( #1862 )
...
* add bitwise inverse
* add vmap + fix nojit
* inverse -> invert
* add to compile + remove unused
2025-02-13 08:44:14 -08:00
Awni Hannun
e425dc00c0
Faster small batch qmv ( #1861 )
...
* faster small batch qmv
* swap batch and block dims for qvm and qmv regular
2025-02-12 22:02:36 -08:00
Awni Hannun
d274ae77f2
More buffer donation in some cases ( #1858 )
...
* more donation
* fix
* add test
2025-02-12 19:41:37 -08:00
Alex Barron
55c5ac7820
fix int64 bug ( #1860 )
2025-02-12 19:23:46 -08:00
Angelos Katharopoulos
0145911bea
Fixes output donation for IO ops on the GPU ( #1857 )
2025-02-12 10:52:30 -08:00
Awni Hannun
0a5215693e
Fix grad copies ( #1854 )
...
* fix grad with copies
* add test
* add test
2025-02-11 15:26:42 -08:00
Awni Hannun
2a45056ba8
Cycle leak break ( #1856 )
...
* detect and break leaks in custom function
* detect and break leaks in custom function
2025-02-11 14:45:02 -08:00
Cheng
142b77751d
Fix compilation error on Windows ( #1844 )
2025-02-10 19:53:05 -08:00
Abe Leininger
a5ededf1c3
CPU LU factorization and linear solvers ( #1451 )
...
* linalg solve backend
* nits
* more nits + fix
* luf primitive and lu, solve, and solve_triangular backends
* changes / nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-02-10 12:32:24 -08:00
Franck Verrot
7df3f792a2
Ensure Conv2D and Conv3D's kernel sizes aren't trimmed ( #1852 )
...
Before the change, this snippet:
```
print(nn.Conv1d(1, 32, 3, padding=1))
print(nn.Conv2d(1, 32, (3, 3), padding=1))
print(nn.Conv3d(1, 32, (3, 3, 3), padding=1))
```
would output:
```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3,), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```
After the change, the output will be:
```
Conv1d(1, 32, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, bias=True)
Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=1, groups=1, bias=True)
Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), dilation=1, bias=True)
```
2025-02-10 06:27:01 -08:00
Angelos Katharopoulos
9eb7d7362f
Fix Split::vmap ( #1845 )
2025-02-08 09:22:13 -08:00
Awni Hannun
1c0c118f7c
Fp64 on the CPU ( #1843 )
...
* add fp64 data type
* clean build
* update docs
* fix bug
2025-02-07 15:52:22 -08:00
Awni Hannun
1a1b2108ec
bump ( #1840 )
2025-02-06 11:53:24 -08:00
Jagrit Digani
b6c6552d20
Add missing #pragma once ( #1838 )
2025-02-06 11:11:22 -08:00
Awni Hannun
83a0340fa7
allow command ( #1836 )
2025-02-06 10:32:24 -08:00
Nripesh Niketan
a62fc1b39f
chore: pre-commit bump ( #1837 )
2025-02-06 08:55:01 -08:00
Awni Hannun
af1b725fda
Fix a couple of slicing bugs ( #1827 )
...
* fix a few bugs
* fix conv grad
* speedup test
* comment
2025-02-05 19:50:08 -08:00
Awni Hannun
9174606d4c
fix sort ( #1835 )
2025-02-05 17:16:27 -08:00
Awni Hannun
ca305afdbe
loading empty list is ok when strict = false ( #1834 )
2025-02-05 16:19:27 -08:00
Awni Hannun
fe5987b81d
faster sort ( #1831 )
2025-02-05 06:10:22 -08:00
Awni Hannun
a229c8cef0
don't duplicate malloc with custom kernel init ( #1830 )
2025-02-04 13:20:57 -08:00
Jesper Stemann Andersen
f6c0499b8d
Resolved ambiguity in mlx::core::take_along_axis ( #1822 )
...
* Resolved ambiguity in mlx::core::take_along_axis
Detected by GCC 10 on riscv64-linux-gnu.
* Formatted
* Removed superfluous parentheses in random_tests.cpp
2025-02-04 06:06:17 -08:00
Awni Hannun
1156c84e86
Refactor common into cpu specific and truly common ( #1817 )
...
* refactor
* fix extension example
* fix no-cpu
2025-02-03 15:58:02 -08:00
Awni Hannun
ec7c7def40
no line buffer for mpi jobs ( #1825 )
2025-02-03 12:02:15 -08:00
Jesper Stemann Andersen
2d8e667400
MinGW support ( #1806 )
...
* Changed /bin/bash to bash for generating compiling preamble
* Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS
* Solved ambiguity wrt. bernoulli test shape
* Disabled distributed/ring on Windows
* Fixed jit_compiler command wrt. MinGW
* Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD
2025-02-01 12:40:06 -08:00
Awni Hannun
80c863b972
Remove accelerate/ ( #1816 )
...
* remove accelerate
* comments
* neon reduction
2025-02-01 07:18:26 -08:00
Angelos Katharopoulos
f5cc1eea72
Allow different value dimensions in sdpa_vector ( #1811 )
2025-01-31 20:58:59 -08:00
Awni Hannun
b7c9f1d38f
scatter axis + gather axis primitives ( #1813 )
...
* scatter axis + gather axis primitives
* add transforms
* comment
2025-01-31 20:48:08 -08:00
Awni Hannun
c6fc07f1f4
Unify CPU matmuls, remove unused accelerate conv ( #1814 )
...
* unify matmuls
* Update mlx/backend/common/matmul.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-01-31 14:43:37 -08:00
Angelos Katharopoulos
ded914f442
Small distributed launch helper ( #1810 )
2025-01-29 17:55:04 -08:00
Awni Hannun
4758c8baa1
Start to cleanup/unify accelerate and common back-ends (Part 1/N) ( #1777 )
...
* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv
2025-01-29 14:34:49 -08:00
Awni Hannun
7064fed1b1
Minor update on MPI docs ( #1805 )
2025-01-28 11:00:08 -08:00
Awni Hannun
1017ac4a9e
add dilation for conv 3d layers + test for 3d conv w/ dilation ( #1802 )
2025-01-28 06:17:07 -08:00
Angelos Katharopoulos
ccb61d7aae
Ring distributed backend ( #1784 )
2025-01-27 22:15:01 -08:00
Awni Hannun
2235dee906
catch stream errors earlier to avoid aborts ( #1801 )
2025-01-27 14:05:43 -08:00
Awni Hannun
28091aa1ff
allow build python lib without specifying path ( #1799 )
2025-01-27 11:22:35 -08:00