Awni Hannun
|
aa86876813
|
fix transformer decoder post norm LN (#1637)
|
2024-12-02 07:02:17 -08:00 |
|
Jesper Stemann Andersen
|
974bb54ab2
|
CMake: Enabled using Accelerate on x86_64 / x64 (#1625)
* CMake: Enabled using Accelerate on x86_64 / x64
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
* CMake: Removed superfluous MLX_BUILD_ARM
|
2024-11-28 10:55:45 -08:00 |
|
Ikko Eltociear Ashimine
|
9bc2183a31
|
docs: update device.cpp (#1632)
unecessary -> unnecessary
|
2024-11-27 20:58:26 -08:00 |
|
Awni Hannun
|
d4b222b6d3
|
Fix some leaks and races (#1629)
* fix leak and fix potential race
* more leak fixes
* fix one more
|
2024-11-27 20:01:20 -08:00 |
|
Jesper Stemann Andersen
|
af2af818a6
|
Enables build for *-linux-musl (#1627)
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:24 -08:00 |
|
Jesper Stemann Andersen
|
698e63a608
|
CMake: Build with dlfcn-win32 to have dlopen etc. on win32 (#1628)
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:13 -08:00 |
|
Awni Hannun
|
211411faf2
|
fix large ops (#1620)
|
2024-11-24 09:17:10 -08:00 |
|
Awni Hannun
|
bb303c45a5
|
version (#1617)
|
2024-11-22 12:00:03 -08:00 |
|
Alex Barron
|
6f7986d592
|
Cleaner qmv /qvm (#1616)
|
2024-11-22 11:14:08 -08:00 |
|
Awni Hannun
|
7cbb4aef17
|
Doc fix (#1615)
|
2024-11-22 11:12:25 -08:00 |
|
Jagrit Digani
|
02bec0bb6d
|
Matrix Attention kernel (#1610)
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
|
2024-11-22 10:34:05 -08:00 |
|
Alex Barron
|
c79f6a4a8c
|
3 and 6 bit quantization (#1613)
* Support 3 and 6 bit quantization
|
2024-11-22 10:22:13 -08:00 |
|
Awni Hannun
|
0c5eea226b
|
Reduce specializations (#1607)
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
|
2024-11-21 19:53:00 -08:00 |
|
Awni Hannun
|
dcca0d7477
|
contiguous op / prim (#1612)
|
2024-11-21 19:51:49 -08:00 |
|
Cocoa
|
0d5e7716ad
|
fix typo: accross -> across (#1609)
Signed-off-by: Cocoa <i@uwucocoa.moe>
|
2024-11-20 15:30:51 -08:00 |
|
Angelos Katharopoulos
|
d8c824c594
|
Formatting fixes (#1606)
|
2024-11-20 15:30:36 -08:00 |
|
Saanidhya
|
cb431dfc9f
|
Adds 3D pooling (#1526)
|
2024-11-19 16:45:24 -08:00 |
|
Awni Hannun
|
61d787726a
|
Fix view scalar bug segfault (#1603)
* fix view scalar bug
* fix view scalar bug
* one more fix
|
2024-11-19 10:54:05 -08:00 |
|
Angelos Katharopoulos
|
5e89aace9b
|
Fix concatenate vmap (#1600)
|
2024-11-19 10:44:04 -08:00 |
|
Awni Hannun
|
2af7e8a9a6
|
fix cmake version (#1601)
|
2024-11-19 08:45:05 -08:00 |
|
Awni Hannun
|
2419edd5b2
|
Faster indexing math in a few kernels (#1589)
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
|
2024-11-18 19:52:00 -08:00 |
|
Awni Hannun
|
bf481e8e5d
|
Fix sibling leak (#1590)
* add test
* fix + test
* fix fix
|
2024-11-18 19:17:01 -08:00 |
|
Awni Hannun
|
9d7fa6b8e6
|
Use osx deployment target to pick Metal version (#1595)
* choose metal based on deployment target rather than system version
* nit
* unused compile def
|
2024-11-18 19:16:49 -08:00 |
|
Angelos Katharopoulos
|
073076ac7d
|
2-Pass Sdpa Inference Kernel (#1597)
|
2024-11-18 17:31:53 -08:00 |
|
Awni Hannun
|
9bd03dd9b4
|
More buffer donation with no-ops (#1591)
* more donation
* fix test
* fix build
|
2024-11-18 08:35:41 -08:00 |
|
Awni Hannun
|
6931f84412
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
|
xnorai
|
16ec0556a0
|
Allocate raw JSON metadata buffer on the heap, and limit its size (#1596)
* Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors
|
2024-11-18 07:22:51 -08:00 |
|
Awni Hannun
|
610af352d4
|
Dispatch bf16 at run time when using the JIT (#1584)
* Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h
|
2024-11-15 16:54:36 -08:00 |
|
Awni Hannun
|
b35f1e3c9c
|
fix donation in sdpa (#1587)
|
2024-11-13 17:21:13 -08:00 |
|
Awni Hannun
|
dfa0b9aab4
|
Cpu fast quantize (#1578)
* cpu quantize
* fix
|
2024-11-08 20:10:39 -08:00 |
|
Alex Barron
|
a4c47b0276
|
OOB QMV fix (#1579)
* fix oob access in qmv
* skip more
* fix small case
|
2024-11-08 17:59:45 -08:00 |
|
Alex Barron
|
111fefd5e9
|
Fix OOB access in qmv (#1577)
* fix oob access in qmv
* skip more
|
2024-11-08 15:41:30 -08:00 |
|
Awni Hannun
|
c1fe1ef081
|
Bfs width limit (#1568)
* width limit
* fix
* large limit
* put env vars in env namespace
|
2024-11-08 15:00:46 -08:00 |
|
Awni Hannun
|
8c34c9dac4
|
throw for invalid case and remove test (#1575)
|
2024-11-08 12:04:03 -08:00 |
|
Awni Hannun
|
91c0277356
|
fix per-example mask + docs in sdpa (#1574)
|
2024-11-08 11:51:15 -08:00 |
|
Awni Hannun
|
9f0d5c12fc
|
Fully wrap the command encoder (#1572)
* fully wrap the command encoder
* use consistent style + fix extensions
|
2024-11-08 11:50:21 -08:00 |
|
Awni Hannun
|
59247c2b62
|
add groups in conv2d (#1569)
|
2024-11-07 13:57:53 -08:00 |
|
Awni Hannun
|
9a3842a2d9
|
fix (#1566)
|
2024-11-06 17:10:33 -08:00 |
|
Alex Barron
|
726dbd9267
|
v0.20.0 (#1565)
|
2024-11-05 12:37:57 -08:00 |
|
Awni Hannun
|
54f05e7195
|
Fix gather vmap (#1563)
* fix gather
* fix
|
2024-11-05 11:29:20 -08:00 |
|
Alex Barron
|
26be608470
|
Add split_k qvm for long context (#1564)
* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance
|
2024-11-05 11:25:19 -08:00 |
|
Angelos Katharopoulos
|
248431eb3c
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
|
Awni Hannun
|
76f275b4df
|
error in rms for wrong size (#1562)
|
2024-11-04 13:24:02 -08:00 |
|
Awni Hannun
|
f1951d6cce
|
Use fewer barriers (#1561)
* use fewer barriers
* comment
|
2024-11-04 10:26:49 -08:00 |
|
Angelos Katharopoulos
|
62f297b51d
|
Sdpa fix (#1558)
|
2024-11-02 21:25:46 -07:00 |
|
Awni Hannun
|
09bc32f62f
|
No extra reshape (#1557)
* no extra reshape
* lint
|
2024-11-02 19:07:20 -07:00 |
|
Chris Offner
|
46d8b16ab4
|
Fix vmap example in docs (#1556)
|
2024-11-02 17:44:14 -07:00 |
|
Chris Offner
|
42533931fa
|
Fix typo "it's" -> "its" (#1555)
|
2024-11-02 06:06:34 -07:00 |
|
Awni Hannun
|
9bd3a7102f
|
add python 3.13 to circle (#1553)
|
2024-11-01 20:55:35 -07:00 |
|
Alex Barron
|
9e516b71ea
|
Add dispatchThreads to custom kernel doc (#1551)
* add dispatchThreads info
* update
* add link
|
2024-11-01 13:07:48 -07:00 |
|