Commit Graph

952 Commits

Author SHA1 Message Date
Awni Hannun
1c0c118f7c
Fp64 on the CPU (#1843)
* add fp64 data type

* clean build

* update docs

* fix bug
2025-02-07 15:52:22 -08:00
Awni Hannun
1a1b2108ec
bump (#1840) 2025-02-06 11:53:24 -08:00
Jagrit Digani
b6c6552d20
Add missing #pragma once (#1838) 2025-02-06 11:11:22 -08:00
Awni Hannun
83a0340fa7
allow command (#1836) 2025-02-06 10:32:24 -08:00
Nripesh Niketan
a62fc1b39f
chore: pre-commit bump (#1837) 2025-02-06 08:55:01 -08:00
Awni Hannun
af1b725fda
Fix a couple of slicing bugs (#1827)
* fix a few bugs

* fix conv grad

* speedup test

* comment
2025-02-05 19:50:08 -08:00
Awni Hannun
9174606d4c
fix sort (#1835) 2025-02-05 17:16:27 -08:00
Awni Hannun
ca305afdbe
loading empty list is ok when strict = false (#1834) 2025-02-05 16:19:27 -08:00
Awni Hannun
fe5987b81d
faster sort (#1831) 2025-02-05 06:10:22 -08:00
Awni Hannun
a229c8cef0
don't duplicate malloc with custom kernel init (#1830) 2025-02-04 13:20:57 -08:00
Jesper Stemann Andersen
f6c0499b8d
Resolved ambiguity in mlx::core::take_along_axis (#1822)
* Resolved ambiguity in mlx::core::take_along_axis

Detected by GCC 10 on riscv64-linux-gnu.

* Formatted

* Removed superfluous parentheses in random_tests.cpp
2025-02-04 06:06:17 -08:00
Awni Hannun
1156c84e86
Refactor common into cpu specific and truly common (#1817)
* refactor

* fix extension example

* fix no-cpu
2025-02-03 15:58:02 -08:00
Awni Hannun
ec7c7def40
no line buffer for mpi jobs (#1825) 2025-02-03 12:02:15 -08:00
Jesper Stemann Andersen
2d8e667400
MinGW support (#1806)
* Changed /bin/bash to bash for generating compiling preamble

* Fix wrt jit_compiler mingw like msvc wrt. WEXITSTATUS

* Solved ambiguity wrt. bernoulli test shape

* Disabled distributed/ring on Windows

* Fixed jit_compiler command wrt. MinGW

* Extended jit_compiler patch wrt. WEXITSTATUS to FreeBSD
2025-02-01 12:40:06 -08:00
Awni Hannun
80c863b972
Remove accelerate/ (#1816)
* remove accelerate

* comments

* neon reduction
2025-02-01 07:18:26 -08:00
Angelos Katharopoulos
f5cc1eea72
Allow different value dimensions in sdpa_vector (#1811) 2025-01-31 20:58:59 -08:00
Awni Hannun
b7c9f1d38f
scatter axis + gather axis primitives (#1813)
* scatter axis + gather axis primitives

* add transforms

* comment
2025-01-31 20:48:08 -08:00
Awni Hannun
c6fc07f1f4
Unify CPU matmuls, remove unused accelerate conv (#1814)
* unify matmuls

* Update mlx/backend/common/matmul.cpp

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-01-31 14:43:37 -08:00
Angelos Katharopoulos
ded914f442
Small distributed launch helper (#1810) 2025-01-29 17:55:04 -08:00
Awni Hannun
4758c8baa1
Start to cleanup/unify accelerate and common back-ends (Part 1/N) (#1777)
* start to cleanup/unify accelerate and common back-ends

* more progress

* simplify

* add half type and allow infs in simd exp

* unify softmax + quantized, more dispatches to simd quantized mm

* add sin/cos, use simd in vector-scalar ops

* faster CPU vectorize quant

* faster erf/erfinv
2025-01-29 14:34:49 -08:00
Awni Hannun
7064fed1b1
Minor update on MPI docs (#1805) 2025-01-28 11:00:08 -08:00
Awni Hannun
1017ac4a9e
add dilation for conv 3d layers + test for 3d conv w/ dilation (#1802) 2025-01-28 06:17:07 -08:00
Angelos Katharopoulos
ccb61d7aae
Ring distributed backend (#1784) 2025-01-27 22:15:01 -08:00
Awni Hannun
2235dee906
catch stream errors earlier to avoid aborts (#1801) 2025-01-27 14:05:43 -08:00
Awni Hannun
28091aa1ff
allow build python lib without specifying path (#1799) 2025-01-27 11:22:35 -08:00
Awni Hannun
121d9a0702
Fix rope fallback to not upcast (#1797)
* fix rope fallback to not upcast

* Update mlx/fast.cpp

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>

---------

Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2025-01-26 19:07:21 -08:00
Nick
0cea88bcc5
Use @ matrix multiplication syntax to document matrix-matrix multiplication (#1793)
Co-authored-by: Nick Thompson <nicholas_a_thompson@apple.com>
2025-01-25 16:02:36 -08:00
Angelos Katharopoulos
72146fc4cd
Einsum ellipsis (#1788) 2025-01-25 01:28:03 -08:00
Awni Hannun
e6a7ab9675
non square qr (#1783) 2025-01-21 14:07:47 -08:00
Angelos Katharopoulos
1f4c127fb9
Move some kernels to get_template_definition (#1782) 2025-01-21 08:59:44 -08:00
Awni Hannun
90532b1f37
recompile when shapeless is different (#1776) 2025-01-20 21:07:10 -08:00
Awni Hannun
a8666a757a
fix shapeless compile on ubuntu24 (#1775) 2025-01-18 06:04:36 -08:00
Awni Hannun
a4667da1eb
Faster synchronization Fence primitive (#1773)
* try faster synchronization

move event

fixes

update bench

fix

fix

* non-functioning kernel

* try alternative fence

* cleanup barrier

* get rid of event_fence

* update benchmarks

* doc string in metal fence
2025-01-17 18:42:19 -08:00
Awni Hannun
0c259961ac
matmul jvps (#1772) 2025-01-17 10:36:26 -08:00
Awni Hannun
f288db8d34
Fix synchronization bug for in stream async works (#1768) 2025-01-15 06:07:34 -08:00
Awni Hannun
33421c1dd3
Limit grad recursion depth by not recursing through non-grad inputs (#1764)
* limit grad recursion depth

* add grad of module test
2025-01-14 14:33:18 -08:00
Nripesh Niketan
5cc5201914
feat: Add orthogonal initializer and corresponding tests (#1651)
* feat: Add orthogonal initializer and corresponding tests

* lint

* Add acknowledgements

* nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 07:29:20 -08:00
Awni Hannun
252e423e81
fix and cleanup event signal/wait for metal (#1765) 2025-01-10 18:37:26 -08:00
wrmsr
a4a2764a52
Fix broadcast_arrays python sig (#1763) 2025-01-10 12:33:26 -08:00
Cheng
ab8e832c18
0ul is not size_t on MSVC (#1762) 2025-01-10 12:33:11 -08:00
Angelos Katharopoulos
1ce0c0fcb0
Bump version (#1761) 2025-01-09 13:48:20 -08:00
Awni Hannun
657f466402
use sdpa and exportable functions in transformer multi head attention (#1760) 2025-01-09 13:11:55 -08:00
Alex Barron
c7b0300af5
Fix batched qmv bug (#1758) 2025-01-09 11:45:57 -08:00
Awni Hannun
da8c885784
Simplify removes no-ops from the tape (#1759)
* simplify removes no-ops from the tape

* comment
2025-01-09 11:23:19 -08:00
Awni Hannun
1ccaf80575
Dynamic broadcasting for shapeless compile/export (#1722)
* working towards dynamic broadcast

* shapeless broadcast

* fix build + nits

* use broadcast arrays in quantize matmul

* some cleanup / consistency

* mend

* some comments

* add vjp, jvp for broadcast axes
2025-01-09 11:04:24 -08:00
Cheng
ec36bfa317
Include command stdout in error message (#1756)
* Include command stdout in error message

* On Windows pclose returns the exit code
2025-01-08 07:17:03 -08:00
Cheng
b8f76f717a
Print exceptions in eval_cpu/eval_gpu and abort (#1754) 2025-01-08 06:31:09 -08:00
Awni Hannun
d1766f2c70
Add boolean mask support in vector SDPA (#1757) 2025-01-07 20:24:53 -08:00
Awni Hannun
516ded618b
Dynamic slicing (#1741)
* dynamic slice and slice update

* python bindings + tests + fix set item

* fix compile issue

* comment

* fix jit
2025-01-07 14:02:16 -08:00
Jesper Stemann Andersen
c9c81d0584
Added additional missing unordered_map include that fixes build on FreeBSD (#1755) 2025-01-07 08:27:55 -08:00