Commit Graph

1192 Commits

Author SHA1 Message Date
Jesper Stemann Andersen
522d8d3917
Added missing netinet/in.h include that fixes build on FreeBSD (#1997)
Defines IPPROTO_TCP.
2025-03-24 08:07:34 -07:00
Awni Hannun
a84cc0123f
promote mask when needed (#1998) 2025-03-23 19:58:28 -07:00
Andrey Velichkevich
f018e248cd
fix(backend): Include algorithm library in Allocator (#1992)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
2025-03-22 21:27:51 -07:00
Awni Hannun
cfd7237a80
fix docs (#1991) 2025-03-21 19:58:53 -07:00
Angelos Katharopoulos
4eef8102c9
Distributed layers (#1270) 2025-03-21 13:52:17 -07:00
Angelos Katharopoulos
69e4dd506b
Add a ring all gather (#1985) 2025-03-21 13:36:51 -07:00
Angelos Katharopoulos
25814a9458
Disable mpi on version mismatch (#1989) 2025-03-21 13:36:26 -07:00
Awni Hannun
2a980a76ce
Add stats and limit to common allocator and enable tests (#1988)
* add stats to common allocator and enable tests

* linux memory and default

* fix
2025-03-21 12:28:36 -07:00
Angelos Katharopoulos
d343782c8b
Cross platform libmpi loading (#1975) 2025-03-21 11:23:10 -07:00
Awni Hannun
4e1994e9d7
move memory APIs into top level mlx.core (#1982) 2025-03-21 07:25:12 -07:00
jiyzhang
65a38c452b
update the formula of smooth_l1_loss (#1986) 2025-03-21 06:25:23 -07:00
Awni Hannun
7b7e2352cd
fix malloc or wait deadlock (#1976) 2025-03-20 16:48:43 -07:00
Awni Hannun
1177d28395
patch bump (#1981) 2025-03-20 15:12:22 -07:00
Awni Hannun
005e7efa64
fix mask in sdpa (#1980)
* fix mask in sdpa

* fix attention mask

* Re-enable routing for array mask

---------

Co-authored-by: Jagrit Digani <digani@apple.com>
2025-03-20 14:53:12 -07:00
Jagrit Digani
b42d13ec84
Update attention tests to show diff, disable array masks (#1978) 2025-03-20 14:25:38 -07:00
Jagrit Digani
9adcd1a650
Support fused masking in Attention (#1924)
* Update API to allow mask='causal' in fast::sdpa

* Add fallback

* Update steel::AttnParams

* Fix typo

* WIP, basic causal

* Update tests

* Update benchmarking

* Update masking loop limits

* Add bool masking and update tests

* Update additive mask

* Update benchmarks

* Update benchmarks

* Update tests

* Update for bfloat error

* Update early exit

* Add random seed to tests
2025-03-20 11:01:32 -07:00
Awni Hannun
3c164fca8c
Fix multistream GPU deadlock (#1969)
* fix multistream GPU deadlock

* comments
2025-03-20 07:19:47 -07:00
jiyzhang
95e335db7b
Update smooth_l1_loss in losses.py (#1974)
According the definition of smooth_l1_loss, the line 

diff = predictions - targets

Should be updated to 

diff = mx.abs(predictions - targets)

After the modification, the result is consistent with PyTorch smooth_l1_loss
2025-03-19 20:19:02 -07:00
Awni Hannun
f90206ad74
Guard nullptr dereference (#1972)
* guard nullptr dereference

* comment
2025-03-19 16:24:10 -07:00
Chunyang Wen
3779150750
refactor: all use schedule (#1973) 2025-03-19 11:24:04 -07:00
Cheng
0a9777aa5c
Do not define MLX_VERSION globally (#1966) 2025-03-18 07:12:40 -07:00
Chunyang Wen
45ad06aac8
Fix typo; Fix lint warning when reuse the same name (#1968)
* Fix typo; Fix lint warning when reuse the same name

* Add missing period
2025-03-18 07:12:24 -07:00
Awni Hannun
c6ea2ba329
Use same accumulation precision in gemv as gemm (#1962)
* use same accumulation precision in gemv as gemm

* faster

* fix compile
2025-03-16 07:13:24 -07:00
Awni Hannun
2770a10240
fix grad with inplace updates (#1961) 2025-03-13 19:13:09 -07:00
Awni Hannun
d2a94f9e6a
Only compile warnings as errors for circle (#1957) 2025-03-12 13:08:19 -07:00
Awni Hannun
32da94507a
fix vmap for flatten (#1955) 2025-03-11 10:42:22 -07:00
Awni Hannun
736a340478
reduce binary size (#1952) 2025-03-11 06:30:44 -07:00
Awni Hannun
117e1355a2
fix copy for large arrays (#1953) 2025-03-10 15:04:25 -07:00
Awni Hannun
3c3e558c60
Support transposed head/seq for kv (#1950)
* support transposed head/seq for kv

* fix flaky test

* nit
2025-03-10 10:53:45 -07:00
Chunyang Wen
cffceda6ee
Add type hint for _extra_repr (#1948) 2025-03-10 06:05:36 -07:00
Chunyang Wen
048805ad2c
Remove unused modules (#1949) 2025-03-10 06:05:26 -07:00
Chunyang Wen
d14c9fe7ea
Add file info when raising errors in save (#1943) 2025-03-08 14:51:04 -08:00
Chunyang Wen
5db90ce822
Fix obsured warning (#1944) 2025-03-08 14:50:39 -08:00
Chunyang Wen
d699cc1330
Fix unreachable warning (#1939)
* Fix unreachable warning

* Update error message
2025-03-07 17:23:04 -08:00
Awni Hannun
c4230747a1
redesign for faster cpu/gpu synch (#1869)
* redesign for faster cpu/gpu synch

* load + more async CPU

* use command encoder API and move more ops to use it

* make fence back-end generic + CPU only fence

* faster build

* fix async eval

* fixes + handle temporaries

* fix / improve cpu conv

* remove unused status, fix siblings

* fix extensions

* fix

* fix no cpu build

* format

* comments

* fix perf regression, remove unecessary abort

* fix events, task limit cpu

* fix waiting

* fix donation / temporaries in normalization
2025-03-06 19:23:38 -08:00
Awni Hannun
5245f12a46
always use json (#1938) 2025-03-06 15:35:56 -08:00
Chunyang Wen
a198b2787e
Remove unused modules (#1936) 2025-03-06 14:20:27 -08:00
Chunyang Wen
04edad8c59
Add doc string for path (#1937) 2025-03-06 14:20:09 -08:00
David Wisdom
392b3060b0
Fix typo in randint docstring (#1932)
This commit fixes a typo in the docstring for mlx.core.random.randint() by changing "roadcastable" to "broadcastable".
2025-03-05 21:48:00 -08:00
Chunyang Wen
85b34d59bc
Clean unused sys (#1929) 2025-03-05 13:48:03 -08:00
Awni Hannun
f599c11bc8
bump (#1931) 2025-03-05 13:16:53 -08:00
Angelos Katharopoulos
0792ff02ff
Only fail when 10 consecutive socket errors occur (#1928) 2025-03-05 13:16:19 -08:00
Alex Barron
fd0d63ba5b
Affine quant always in fp32 (#1925)
* do affine quant in fp32

* static cast
2025-03-04 17:50:19 -08:00
Abe Leininger
3835a428c5
Adds nuclear norm support (#1894)
* adjust norm unit test tolerance
2025-03-04 13:26:02 -08:00
Angelos Katharopoulos
9680f72cca
Add a multi optimizer (#1916) 2025-03-04 13:16:35 -08:00
Angelos Katharopoulos
a0737273d3
Allow debugging in distributed mode (#1920) 2025-03-04 13:01:10 -08:00
Awni Hannun
e613d0eaf0
SDPA support for small batch (over sequence) queries (#1922)
* batch query sdpa

* batch sdpa for query
2025-03-04 10:59:04 -08:00
Awni Hannun
6bcd6bcf70
fix donation in scan (#1917) 2025-03-03 11:30:59 -08:00
Awni Hannun
ba12e4999a
Use a heap for small sizes (#1911)
* use a heap for small sizes

* check if VM
2025-03-03 06:50:57 -08:00
Awni Hannun
4e7cd31d12
Fix slice data size (#1913)
* fix slice data size

* add test
2025-03-02 21:50:42 -08:00