Awni Hannun
c6fc07f1f4
Unify CPU matmuls, remove unused accelerate conv ( #1814 )
...
* unify matmuls
* Update mlx/backend/common/matmul.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-01-31 14:43:37 -08:00
Angelos Katharopoulos
ded914f442
Small distributed launch helper ( #1810 )
2025-01-29 17:55:04 -08:00
Awni Hannun
4758c8baa1
Start to cleanup/unify accelerate and common back-ends (Part 1/N) ( #1777 )
...
* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv
2025-01-29 14:34:49 -08:00
Awni Hannun
7064fed1b1
Minor update on MPI docs ( #1805 )
2025-01-28 11:00:08 -08:00
Awni Hannun
1017ac4a9e
add dilation for conv 3d layers + test for 3d conv w/ dilation ( #1802 )
2025-01-28 06:17:07 -08:00
Angelos Katharopoulos
ccb61d7aae
Ring distributed backend ( #1784 )
2025-01-27 22:15:01 -08:00
Awni Hannun
2235dee906
catch stream errors earlier to avoid aborts ( #1801 )
2025-01-27 14:05:43 -08:00
Awni Hannun
28091aa1ff
allow build python lib without specifying path ( #1799 )
2025-01-27 11:22:35 -08:00
Awni Hannun
121d9a0702
Fix rope fallback to not upcast ( #1797 )
...
* fix rope fallback to not upcast
* Update mlx/fast.cpp
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-01-26 19:07:21 -08:00
Nick
0cea88bcc5
Use @ matrix multiplication syntax to document matrix-matrix multiplication ( #1793 )
...
Co-authored-by: Nick Thompson <nicholas_a_thompson@apple.com>
2025-01-25 16:02:36 -08:00
Angelos Katharopoulos
72146fc4cd
Einsum ellipsis ( #1788 )
2025-01-25 01:28:03 -08:00
Awni Hannun
e6a7ab9675
non square qr ( #1783 )
2025-01-21 14:07:47 -08:00
Angelos Katharopoulos
1f4c127fb9
Move some kernels to get_template_definition
( #1782 )
2025-01-21 08:59:44 -08:00
Awni Hannun
90532b1f37
recompile when shapeless is different ( #1776 )
2025-01-20 21:07:10 -08:00
Awni Hannun
a8666a757a
fix shapeless compile on ubuntu24 ( #1775 )
2025-01-18 06:04:36 -08:00
Awni Hannun
a4667da1eb
Faster synchronization Fence
primitive ( #1773 )
...
* try faster synchronization
move event
fixes
update bench
fix
fix
* non-functioning kernel
* try alternative fence
* cleanup barrier
* get rid of event_fence
* update benchmarks
* doc string in metal fence
2025-01-17 18:42:19 -08:00
Awni Hannun
0c259961ac
matmul jvps ( #1772 )
2025-01-17 10:36:26 -08:00
Awni Hannun
f288db8d34
Fix synchronization bug for in stream async works ( #1768 )
2025-01-15 06:07:34 -08:00
Awni Hannun
33421c1dd3
Limit grad recursion depth by not recursing through non-grad inputs ( #1764 )
...
* limit grad recursion depth
* add grad of module test
2025-01-14 14:33:18 -08:00
Nripesh Niketan
5cc5201914
feat: Add orthogonal initializer and corresponding tests ( #1651 )
...
* feat: Add orthogonal initializer and corresponding tests
* lint
* Add acknowledgements
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 07:29:20 -08:00
Awni Hannun
252e423e81
fix and cleanup event signal/wait for metal ( #1765 )
2025-01-10 18:37:26 -08:00
wrmsr
a4a2764a52
Fix broadcast_arrays python sig ( #1763 )
2025-01-10 12:33:26 -08:00
Cheng
ab8e832c18
0ul is not size_t on MSVC ( #1762 )
2025-01-10 12:33:11 -08:00
Angelos Katharopoulos
1ce0c0fcb0
Bump version ( #1761 )
2025-01-09 13:48:20 -08:00
Awni Hannun
657f466402
use sdpa and exportable functions in transformer multi head attention ( #1760 )
2025-01-09 13:11:55 -08:00
Alex Barron
c7b0300af5
Fix batched qmv bug ( #1758 )
2025-01-09 11:45:57 -08:00
Awni Hannun
da8c885784
Simplify removes no-ops from the tape ( #1759 )
...
* simplify removes no-ops from the tape
* comment
2025-01-09 11:23:19 -08:00
Awni Hannun
1ccaf80575
Dynamic broadcasting for shapeless compile/export ( #1722 )
...
* working towards dynamic broadcast
* shapeless broadcast
* fix build + nits
* use broadcast arrays in quantize matmul
* some cleanup / consistency
* mend
* some comments
* add vjp, jvp for broadcast axes
2025-01-09 11:04:24 -08:00
Cheng
ec36bfa317
Include command stdout in error message ( #1756 )
...
* Include command stdout in error message
* On Windows pclose returns the exit code
2025-01-08 07:17:03 -08:00
Cheng
b8f76f717a
Print exceptions in eval_cpu/eval_gpu and abort ( #1754 )
2025-01-08 06:31:09 -08:00
Awni Hannun
d1766f2c70
Add boolean mask support in vector SDPA ( #1757 )
2025-01-07 20:24:53 -08:00
Awni Hannun
516ded618b
Dynamic slicing ( #1741 )
...
* dynamic slice and slice update
* python bindings + tests + fix set item
* fix compile issue
* comment
* fix jit
2025-01-07 14:02:16 -08:00
Jesper Stemann Andersen
c9c81d0584
Added additional missing unordered_map include that fixes build on FreeBSD ( #1755 )
2025-01-07 08:27:55 -08:00
Angelos Katharopoulos
545f84d905
Refactor distributed backend ( #1752 )
2025-01-06 17:33:15 -08:00
Awni Hannun
d5ec172c95
Allow boolean mask in sdpa ( #1753 )
...
* allow boolean mask in sdpa
* more permissive donation in ternary
2025-01-06 16:57:07 -08:00
Angelos Katharopoulos
25b3a3e541
Optionally specify names for arrays when exporting ( #1749 )
2025-01-06 13:07:46 -08:00
Awni Hannun
058d6ce683
mpi send use input as output ( #1750 )
...
* mpi send use input as output
* move earlier
2025-01-06 06:08:43 -08:00
Angelos Katharopoulos
eab93985b8
Update custom function docs ( #1748 )
2025-01-03 16:35:25 -08:00
Awni Hannun
b51d70a83c
export docs ( #1747 )
2025-01-03 15:04:17 -08:00
Awni Hannun
259025100e
Fix nd ternary on GPU ( #1746 )
2025-01-03 11:52:17 -08:00
Awni Hannun
c9d30aa6ac
MLX in C++ example ( #1736 )
...
* MLX in C++ example
* nits
* fix docs
2025-01-02 19:09:04 -08:00
Angelos Katharopoulos
8544b42007
Add namespace ( #1745 )
2025-01-02 16:49:23 -08:00
Awni Hannun
6fa0501387
Fix concatenate/slice_update vjp + reduce binary size ( #1735 )
...
* fix concatenate vjp + reduce binary size
* also cast in slice update
2025-01-02 16:36:33 -08:00
Awni Hannun
ae69cb15e9
shapeless compile in docs and partially shapeless reshape ( #1742 )
2025-01-02 16:24:42 -08:00
Awni Hannun
a64a8dfe45
fix extension ( #1740 )
2025-01-02 16:16:16 -08:00
Venkata Naga Aditya Datta Chivukula
491fa95b1f
Added Kronecker Product ( #1728 )
2025-01-02 16:00:34 -08:00
Danilo Peixoto
92ec632ad5
Fix Distributed Communication documentation ( #1731 )
...
* Add missing `size()` method call for group
2025-01-02 14:08:38 -08:00
Cheng
8ecdfb718b
Fix export.cpp compilation with MSVC ( #1737 )
2024-12-29 06:56:30 -08:00
Awni Hannun
4ba0c24a8f
Export / import functions to / from a file ( #1642 )
...
* export and import functions
* refactor + works for few primitives
* nit
* allow primitives with state
* nit
* nit
* simplify serialize / deserialize
* fix for constants
* python bindings
* maybe fix serialize failure case
* add example
* more primitives, training kind of works
* same result for python and c++
* some fixes
* fix export
* template it up
* some simplificatoin
* rebase
* allow kwargs and multiple functions
* exporter
* more primitives for exporting
* deal with endianness
* handle invalid stream
* add docstring
2024-12-24 11:19:13 -08:00
Cheng
935c8c4bb1
Make mx.compile work on Windows ( #1697 )
...
* Invoke MSVC on Windows in mx.compile
* Export kernel symbol on MSVC
* Remove unused template
* Parse env pairs in a robust way
* No need of cassert
* Remove unnecessary helpers
* Fix right trim
* Move command building to a separate file
* Missing header
* Do not pollute cwd with cl.exe
* Simplify str concat
* Pass output dir
* Fix styling
2024-12-24 07:02:33 -08:00