Awni Hannun
90532b1f37
recompile when shapeless is different ( #1776 )
2025-01-20 21:07:10 -08:00
Awni Hannun
a8666a757a
fix shapeless compile on ubuntu24 ( #1775 )
2025-01-18 06:04:36 -08:00
Awni Hannun
a4667da1eb
Faster synchronization Fence
primitive ( #1773 )
...
* try faster synchronization
move event
fixes
update bench
fix
fix
* non-functioning kernel
* try alternative fence
* cleanup barrier
* get rid of event_fence
* update benchmarks
* doc string in metal fence
2025-01-17 18:42:19 -08:00
Awni Hannun
0c259961ac
matmul jvps ( #1772 )
2025-01-17 10:36:26 -08:00
Awni Hannun
f288db8d34
Fix synchronization bug for in stream async works ( #1768 )
2025-01-15 06:07:34 -08:00
Awni Hannun
33421c1dd3
Limit grad recursion depth by not recursing through non-grad inputs ( #1764 )
...
* limit grad recursion depth
* add grad of module test
2025-01-14 14:33:18 -08:00
Nripesh Niketan
5cc5201914
feat: Add orthogonal initializer and corresponding tests ( #1651 )
...
* feat: Add orthogonal initializer and corresponding tests
* lint
* Add acknowledgements
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 07:29:20 -08:00
Awni Hannun
252e423e81
fix and cleanup event signal/wait for metal ( #1765 )
2025-01-10 18:37:26 -08:00
wrmsr
a4a2764a52
Fix broadcast_arrays python sig ( #1763 )
2025-01-10 12:33:26 -08:00
Cheng
ab8e832c18
0ul is not size_t on MSVC ( #1762 )
2025-01-10 12:33:11 -08:00
Angelos Katharopoulos
1ce0c0fcb0
Bump version ( #1761 )
2025-01-09 13:48:20 -08:00
Awni Hannun
657f466402
use sdpa and exportable functions in transformer multi head attention ( #1760 )
2025-01-09 13:11:55 -08:00
Alex Barron
c7b0300af5
Fix batched qmv bug ( #1758 )
2025-01-09 11:45:57 -08:00
Awni Hannun
da8c885784
Simplify removes no-ops from the tape ( #1759 )
...
* simplify removes no-ops from the tape
* comment
2025-01-09 11:23:19 -08:00
Awni Hannun
1ccaf80575
Dynamic broadcasting for shapeless compile/export ( #1722 )
...
* working towards dynamic broadcast
* shapeless broadcast
* fix build + nits
* use broadcast arrays in quantize matmul
* some cleanup / consistency
* mend
* some comments
* add vjp, jvp for broadcast axes
2025-01-09 11:04:24 -08:00
Cheng
ec36bfa317
Include command stdout in error message ( #1756 )
...
* Include command stdout in error message
* On Windows pclose returns the exit code
2025-01-08 07:17:03 -08:00
Cheng
b8f76f717a
Print exceptions in eval_cpu/eval_gpu and abort ( #1754 )
2025-01-08 06:31:09 -08:00
Awni Hannun
d1766f2c70
Add boolean mask support in vector SDPA ( #1757 )
2025-01-07 20:24:53 -08:00
Awni Hannun
516ded618b
Dynamic slicing ( #1741 )
...
* dynamic slice and slice update
* python bindings + tests + fix set item
* fix compile issue
* comment
* fix jit
2025-01-07 14:02:16 -08:00
Jesper Stemann Andersen
c9c81d0584
Added additional missing unordered_map include that fixes build on FreeBSD ( #1755 )
2025-01-07 08:27:55 -08:00
Angelos Katharopoulos
545f84d905
Refactor distributed backend ( #1752 )
2025-01-06 17:33:15 -08:00
Awni Hannun
d5ec172c95
Allow boolean mask in sdpa ( #1753 )
...
* allow boolean mask in sdpa
* more permissive donation in ternary
2025-01-06 16:57:07 -08:00
Angelos Katharopoulos
25b3a3e541
Optionally specify names for arrays when exporting ( #1749 )
2025-01-06 13:07:46 -08:00
Awni Hannun
058d6ce683
mpi send use input as output ( #1750 )
...
* mpi send use input as output
* move earlier
2025-01-06 06:08:43 -08:00
Angelos Katharopoulos
eab93985b8
Update custom function docs ( #1748 )
2025-01-03 16:35:25 -08:00
Awni Hannun
b51d70a83c
export docs ( #1747 )
2025-01-03 15:04:17 -08:00
Awni Hannun
259025100e
Fix nd ternary on GPU ( #1746 )
2025-01-03 11:52:17 -08:00
Awni Hannun
c9d30aa6ac
MLX in C++ example ( #1736 )
...
* MLX in C++ example
* nits
* fix docs
2025-01-02 19:09:04 -08:00
Angelos Katharopoulos
8544b42007
Add namespace ( #1745 )
2025-01-02 16:49:23 -08:00
Awni Hannun
6fa0501387
Fix concatenate/slice_update vjp + reduce binary size ( #1735 )
...
* fix concatenate vjp + reduce binary size
* also cast in slice update
2025-01-02 16:36:33 -08:00
Awni Hannun
ae69cb15e9
shapeless compile in docs and partially shapeless reshape ( #1742 )
2025-01-02 16:24:42 -08:00
Awni Hannun
a64a8dfe45
fix extension ( #1740 )
2025-01-02 16:16:16 -08:00
Venkata Naga Aditya Datta Chivukula
491fa95b1f
Added Kronecker Product ( #1728 )
2025-01-02 16:00:34 -08:00
Danilo Peixoto
92ec632ad5
Fix Distributed Communication documentation ( #1731 )
...
* Add missing `size()` method call for group
2025-01-02 14:08:38 -08:00
Cheng
8ecdfb718b
Fix export.cpp compilation with MSVC ( #1737 )
2024-12-29 06:56:30 -08:00
Awni Hannun
4ba0c24a8f
Export / import functions to / from a file ( #1642 )
...
* export and import functions
* refactor + works for few primitives
* nit
* allow primitives with state
* nit
* nit
* simplify serialize / deserialize
* fix for constants
* python bindings
* maybe fix serialize failure case
* add example
* more primitives, training kind of works
* same result for python and c++
* some fixes
* fix export
* template it up
* some simplificatoin
* rebase
* allow kwargs and multiple functions
* exporter
* more primitives for exporting
* deal with endianness
* handle invalid stream
* add docstring
2024-12-24 11:19:13 -08:00
Cheng
935c8c4bb1
Make mx.compile work on Windows ( #1697 )
...
* Invoke MSVC on Windows in mx.compile
* Export kernel symbol on MSVC
* Remove unused template
* Parse env pairs in a robust way
* No need of cassert
* Remove unnecessary helpers
* Fix right trim
* Move command building to a separate file
* Missing header
* Do not pollute cwd with cl.exe
* Simplify str concat
* Pass output dir
* Fix styling
2024-12-24 07:02:33 -08:00
Valentin Roussellet
88f993da38
Explicit parentheses around some logical operators ( #1732 )
...
* fix some warnings
* format
2024-12-24 07:02:20 -08:00
Awni Hannun
ebfe64b92d
shapeless slice update and broadcast when possible ( #1727 )
2024-12-23 11:25:15 -08:00
Awni Hannun
0308e9af71
Allow offset to be an mx.array for mx.fast.rope
( #1724 )
...
* allow offset for rope
* comment
2024-12-19 15:51:44 -08:00
Awni Hannun
c3628eea49
Add mx.finfo
and use it when making causal mask ( #1726 )
...
* finfo
* fixes
* docs
2024-12-19 14:52:41 -08:00
Awni Hannun
e03f0372b1
More shape type ( #1705 )
...
* more shape type
* fix
2024-12-19 08:08:20 -08:00
Alex Barron
f17536af9c
More lenient mask type check in SDPA ( #1723 )
...
* check mask type
* require promotion
2024-12-18 19:41:38 -08:00
Cheng
ed4ec81bca
Link python extension with mlx statically on Windows ( #1716 )
...
* Link python extension with mlx statically on Windows
* More readable code
2024-12-18 19:26:04 -08:00
Awni Hannun
7480059306
track resource limit and throw if exceeded ( #1718 )
2024-12-18 18:45:58 -08:00
Awni Hannun
8bae22b0fa
fix deletion of non-evaled arrays with siblings ( #1714 )
2024-12-18 18:45:36 -08:00
Alex Barron
49c34c4161
check mask type ( #1721 )
2024-12-18 14:25:18 -08:00
Awni Hannun
5548fcc96d
fix synch race ( #1719 )
2024-12-18 12:25:16 -08:00
Cheng
070bd433ab
Shorter kernel name for Windows ( #1701 )
...
* Shorter kernel name for Windows
* Only hash the clipped part
2024-12-17 18:51:38 -08:00
Cheng
c8fb54951a
Define NOMINMAX before windows.h ( #1715 )
2024-12-17 18:51:24 -08:00