Anastasiia Filippova
|
515f104926
|
Min / max reductions (#2041)
|
2025-04-09 23:22:20 -07:00 |
|
Awni Hannun
|
00794c42bc
|
Fix causal mask sdpa vec (#2053)
* fix sdpa vector causal mask
* test
|
2025-04-08 09:11:23 -07:00 |
|
Awni Hannun
|
f2c85308c1
|
add a half simd gemm fallback (#2046)
* add a half simd gemm fallback
* nit
|
2025-04-07 09:31:29 -07:00 |
|
Jagrit Digani
|
8777fd104f
|
Depthwise Conv2D optimization (#2036)
- Add new specialized kernel for small kernel (kernels size <= 7), small strides (strides <= 2) depthwise 2d convolutions
- Add related tests
|
2025-04-03 09:42:04 -07:00 |
|
Awni Hannun
|
de5f38fd48
|
Custom logsumexp (#2028)
* initial custom logsumexp
* more tests
* comments + fix
|
2025-03-31 07:36:55 -07:00 |
|
Angelos Katharopoulos
|
ec2854b13a
|
Swap -inf for finite_minimum value (#2029)
|
2025-03-30 21:55:04 -07:00 |
|
Awni Hannun
|
28f39e9038
|
Log for complex numbers in Metal (#2025)
* Log for complex numbers in Metal
* fix log2
|
2025-03-30 17:04:38 -07:00 |
|
Awni Hannun
|
05d7118561
|
causal vector sdpa (#2018)
* causal vector sdpa
* get rid of memory threshold
|
2025-03-28 12:36:13 -07:00 |
|
Awni Hannun
|
98b901ad66
|
enable complex gemm (#2017)
|
2025-03-28 10:45:13 -07:00 |
|
Awni Hannun
|
5580b47291
|
iinfo and scalar overflow detection (#2009)
|
2025-03-27 19:54:56 -07:00 |
|
Awni Hannun
|
a84cc0123f
|
promote mask when needed (#1998)
|
2025-03-23 19:58:28 -07:00 |
|
Angelos Katharopoulos
|
4eef8102c9
|
Distributed layers (#1270)
|
2025-03-21 13:52:17 -07:00 |
|
Angelos Katharopoulos
|
69e4dd506b
|
Add a ring all gather (#1985)
|
2025-03-21 13:36:51 -07:00 |
|
Awni Hannun
|
2a980a76ce
|
Add stats and limit to common allocator and enable tests (#1988)
* add stats to common allocator and enable tests
* linux memory and default
* fix
|
2025-03-21 12:28:36 -07:00 |
|
Awni Hannun
|
4e1994e9d7
|
move memory APIs into top level mlx.core (#1982)
|
2025-03-21 07:25:12 -07:00 |
|
Awni Hannun
|
7b7e2352cd
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
|
Awni Hannun
|
005e7efa64
|
fix mask in sdpa (#1980)
* fix mask in sdpa
* fix attention mask
* Re-enable routing for array mask
---------
Co-authored-by: Jagrit Digani <digani@apple.com>
|
2025-03-20 14:53:12 -07:00 |
|
Jagrit Digani
|
b42d13ec84
|
Update attention tests to show diff, disable array masks (#1978)
|
2025-03-20 14:25:38 -07:00 |
|
Jagrit Digani
|
9adcd1a650
|
Support fused masking in Attention (#1924)
* Update API to allow mask='causal' in fast::sdpa
* Add fallback
* Update steel::AttnParams
* Fix typo
* WIP, basic causal
* Update tests
* Update benchmarking
* Update masking loop limits
* Add bool masking and update tests
* Update additive mask
* Update benchmarks
* Update benchmarks
* Update tests
* Update for bfloat error
* Update early exit
* Add random seed to tests
|
2025-03-20 11:01:32 -07:00 |
|
Awni Hannun
|
3c164fca8c
|
Fix multistream GPU deadlock (#1969)
* fix multistream GPU deadlock
* comments
|
2025-03-20 07:19:47 -07:00 |
|
Awni Hannun
|
c6ea2ba329
|
Use same accumulation precision in gemv as gemm (#1962)
* use same accumulation precision in gemv as gemm
* faster
* fix compile
|
2025-03-16 07:13:24 -07:00 |
|
Awni Hannun
|
2770a10240
|
fix grad with inplace updates (#1961)
|
2025-03-13 19:13:09 -07:00 |
|
Awni Hannun
|
32da94507a
|
fix vmap for flatten (#1955)
|
2025-03-11 10:42:22 -07:00 |
|
Awni Hannun
|
3c3e558c60
|
Support transposed head/seq for kv (#1950)
* support transposed head/seq for kv
* fix flaky test
* nit
|
2025-03-10 10:53:45 -07:00 |
|
Abe Leininger
|
3835a428c5
|
Adds nuclear norm support (#1894)
* adjust norm unit test tolerance
|
2025-03-04 13:26:02 -08:00 |
|
Angelos Katharopoulos
|
9680f72cca
|
Add a multi optimizer (#1916)
|
2025-03-04 13:16:35 -08:00 |
|
Awni Hannun
|
e613d0eaf0
|
SDPA support for small batch (over sequence) queries (#1922)
* batch query sdpa
* batch sdpa for query
|
2025-03-04 10:59:04 -08:00 |
|
Awni Hannun
|
6bcd6bcf70
|
fix donation in scan (#1917)
|
2025-03-03 11:30:59 -08:00 |
|
Awni Hannun
|
4e7cd31d12
|
Fix slice data size (#1913)
* fix slice data size
* add test
|
2025-03-02 21:50:42 -08:00 |
|
Angelos Katharopoulos
|
5e6c130d93
|
RMS norm without scaling (#1915)
|
2025-02-28 20:26:57 -08:00 |
|
Awni Hannun
|
7d042f17fe
|
Double for lapack (#1904)
* double for lapack ops
* add double support for lapack ops
|
2025-02-25 11:39:36 -08:00 |
|
Awni Hannun
|
28b8079e30
|
fix double type promotion (#1901)
|
2025-02-25 06:00:53 -08:00 |
|
Awni Hannun
|
7face5d9fd
|
fix cpu compile (#1897)
|
2025-02-24 14:10:30 -08:00 |
|
Awni Hannun
|
2d0f384b6f
|
fix simd erf_inv (#1896)
|
2025-02-24 13:57:47 -08:00 |
|
Angelos Katharopoulos
|
10b271d963
|
Ring update (#1885)
|
2025-02-20 14:32:31 -08:00 |
|
Awni Hannun
|
bbda0fdbdb
|
Allow non-square lu (#1889)
|
2025-02-20 08:13:23 -08:00 |
|
Awni Hannun
|
c707b2b0a6
|
Limit compile buffers (#1887)
* limit compile buffers
* maybe not flaky test
|
2025-02-19 20:28:13 -08:00 |
|
Angelos Katharopoulos
|
78ba24c37d
|
Raise an exception in the rope op if input is integer (#1884)
|
2025-02-19 14:43:39 -08:00 |
|
Angelos Katharopoulos
|
1a2cb72030
|
Ensure linspace always contains start and stop (#1883)
|
2025-02-19 13:53:20 -08:00 |
|
Abe Leininger
|
344a29506e
|
Enforce triangular matrix form in tri_inv (#1876)
* fix tri_inv bug
* Revert "fix tri_inv bug"
This reverts commit b74b290201 .
* Make sure that tri_inv returns a triangular matrix
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
|
2025-02-19 12:42:33 -08:00 |
|
Angelos Katharopoulos
|
71de73a668
|
Fix convs by reverting #1803 (#1882)
|
2025-02-18 14:36:34 -08:00 |
|
Alex Barron
|
4c1dfa58b7
|
xor op on arrays (#1875)
|
2025-02-17 00:24:53 -08:00 |
|
Jagrit Digani
|
2dc307f2e6
|
Winograd Update for Small batches (#1803)
* Build in padding to Winograd kernels
* Add new fused Winograd kernel
* Enable weight flipping in Winograd kernels
|
2025-02-14 13:08:13 -08:00 |
|
Alex Barron
|
7f2d1024f3
|
add f8_e4m3 loading (#1859)
|
2025-02-13 17:10:03 -08:00 |
|
Awni Hannun
|
428f589364
|
Revert "More buffer donation in some cases (#1858)" (#1863)
This reverts commit d274ae77f2 .
|
2025-02-13 14:21:44 -08:00 |
|
Alex Barron
|
5cd97f7ffe
|
Bitwise Inverse (#1862)
* add bitwise inverse
* add vmap + fix nojit
* inverse -> invert
* add to compile + remove unused
|
2025-02-13 08:44:14 -08:00 |
|
Awni Hannun
|
d274ae77f2
|
More buffer donation in some cases (#1858)
* more donation
* fix
* add test
|
2025-02-12 19:41:37 -08:00 |
|
Alex Barron
|
55c5ac7820
|
fix int64 bug (#1860)
|
2025-02-12 19:23:46 -08:00 |
|
Angelos Katharopoulos
|
0145911bea
|
Fixes output donation for IO ops on the GPU (#1857)
|
2025-02-12 10:52:30 -08:00 |
|
Awni Hannun
|
0a5215693e
|
Fix grad copies (#1854)
* fix grad with copies
* add test
* add test
|
2025-02-11 15:26:42 -08:00 |
|