Alex Barron
|
890fdd1ef0
|
start
|
2024-12-03 19:53:18 -08:00 |
|
Jagrit Digani
|
9d40e521d7
|
Stop matrix copies with new attention kernel (#1639)
|
2024-12-02 14:12:38 -08:00 |
|
Jesper Stemann Andersen
|
e4eeb4e910
|
Added missing unordered_map includes (#1635)
* Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD
|
2024-12-02 07:03:03 -08:00 |
|
Ikko Eltociear Ashimine
|
9bc2183a31
|
docs: update device.cpp (#1632)
unecessary -> unnecessary
|
2024-11-27 20:58:26 -08:00 |
|
Awni Hannun
|
d4b222b6d3
|
Fix some leaks and races (#1629)
* fix leak and fix potential race
* more leak fixes
* fix one more
|
2024-11-27 20:01:20 -08:00 |
|
Jesper Stemann Andersen
|
af2af818a6
|
Enables build for *-linux-musl (#1627)
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:24 -08:00 |
|
Awni Hannun
|
211411faf2
|
fix large ops (#1620)
|
2024-11-24 09:17:10 -08:00 |
|
Alex Barron
|
6f7986d592
|
Cleaner qmv /qvm (#1616)
|
2024-11-22 11:14:08 -08:00 |
|
Jagrit Digani
|
02bec0bb6d
|
Matrix Attention kernel (#1610)
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
|
2024-11-22 10:34:05 -08:00 |
|
Alex Barron
|
c79f6a4a8c
|
3 and 6 bit quantization (#1613)
* Support 3 and 6 bit quantization
|
2024-11-22 10:22:13 -08:00 |
|
Awni Hannun
|
0c5eea226b
|
Reduce specializations (#1607)
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
|
2024-11-21 19:53:00 -08:00 |
|
Awni Hannun
|
dcca0d7477
|
contiguous op / prim (#1612)
|
2024-11-21 19:51:49 -08:00 |
|
Awni Hannun
|
61d787726a
|
Fix view scalar bug segfault (#1603)
* fix view scalar bug
* fix view scalar bug
* one more fix
|
2024-11-19 10:54:05 -08:00 |
|
Angelos Katharopoulos
|
5e89aace9b
|
Fix concatenate vmap (#1600)
|
2024-11-19 10:44:04 -08:00 |
|
Awni Hannun
|
2419edd5b2
|
Faster indexing math in a few kernels (#1589)
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
|
2024-11-18 19:52:00 -08:00 |
|
Awni Hannun
|
bf481e8e5d
|
Fix sibling leak (#1590)
* add test
* fix + test
* fix fix
|
2024-11-18 19:17:01 -08:00 |
|
Awni Hannun
|
9d7fa6b8e6
|
Use osx deployment target to pick Metal version (#1595)
* choose metal based on deployment target rather than system version
* nit
* unused compile def
|
2024-11-18 19:16:49 -08:00 |
|
Angelos Katharopoulos
|
073076ac7d
|
2-Pass Sdpa Inference Kernel (#1597)
|
2024-11-18 17:31:53 -08:00 |
|
Awni Hannun
|
9bd03dd9b4
|
More buffer donation with no-ops (#1591)
* more donation
* fix test
* fix build
|
2024-11-18 08:35:41 -08:00 |
|
Awni Hannun
|
6931f84412
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
|
xnorai
|
16ec0556a0
|
Allocate raw JSON metadata buffer on the heap, and limit its size (#1596)
* Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors
|
2024-11-18 07:22:51 -08:00 |
|
Awni Hannun
|
610af352d4
|
Dispatch bf16 at run time when using the JIT (#1584)
* Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h
|
2024-11-15 16:54:36 -08:00 |
|
Awni Hannun
|
b35f1e3c9c
|
fix donation in sdpa (#1587)
|
2024-11-13 17:21:13 -08:00 |
|
Awni Hannun
|
dfa0b9aab4
|
Cpu fast quantize (#1578)
* cpu quantize
* fix
|
2024-11-08 20:10:39 -08:00 |
|
Alex Barron
|
a4c47b0276
|
OOB QMV fix (#1579)
* fix oob access in qmv
* skip more
* fix small case
|
2024-11-08 17:59:45 -08:00 |
|
Alex Barron
|
111fefd5e9
|
Fix OOB access in qmv (#1577)
* fix oob access in qmv
* skip more
|
2024-11-08 15:41:30 -08:00 |
|
Awni Hannun
|
c1fe1ef081
|
Bfs width limit (#1568)
* width limit
* fix
* large limit
* put env vars in env namespace
|
2024-11-08 15:00:46 -08:00 |
|
Awni Hannun
|
91c0277356
|
fix per-example mask + docs in sdpa (#1574)
|
2024-11-08 11:51:15 -08:00 |
|
Awni Hannun
|
9f0d5c12fc
|
Fully wrap the command encoder (#1572)
* fully wrap the command encoder
* use consistent style + fix extensions
|
2024-11-08 11:50:21 -08:00 |
|
Awni Hannun
|
59247c2b62
|
add groups in conv2d (#1569)
|
2024-11-07 13:57:53 -08:00 |
|
Awni Hannun
|
9a3842a2d9
|
fix (#1566)
|
2024-11-06 17:10:33 -08:00 |
|
Awni Hannun
|
54f05e7195
|
Fix gather vmap (#1563)
* fix gather
* fix
|
2024-11-05 11:29:20 -08:00 |
|
Alex Barron
|
26be608470
|
Add split_k qvm for long context (#1564)
* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance
|
2024-11-05 11:25:19 -08:00 |
|
Angelos Katharopoulos
|
248431eb3c
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
|
Awni Hannun
|
76f275b4df
|
error in rms for wrong size (#1562)
|
2024-11-04 13:24:02 -08:00 |
|
Awni Hannun
|
f1951d6cce
|
Use fewer barriers (#1561)
* use fewer barriers
* comment
|
2024-11-04 10:26:49 -08:00 |
|
Angelos Katharopoulos
|
62f297b51d
|
Sdpa fix (#1558)
|
2024-11-02 21:25:46 -07:00 |
|
Awni Hannun
|
57c6aa7188
|
fix multi output leak (#1548)
|
2024-10-31 09:32:01 -07:00 |
|
Awni Hannun
|
4f72c66911
|
improvements to scatter / gather (#1541)
|
2024-10-30 19:30:54 -07:00 |
|
Jagrit Digani
|
960e3f0f05
|
Gemm update (#1518)
|
2024-10-30 19:30:28 -07:00 |
|
Awni Hannun
|
884af42da2
|
Fix thread group for large arrays (#1543)
* fix thread group for large arrays
* comment
* one more
|
2024-10-30 16:25:12 -07:00 |
|
Alex Barron
|
048fabdabd
|
Fix vmap constant output size (#1524)
* use inputs to determine output size
* remove noop vmap tests
|
2024-10-30 16:16:53 -07:00 |
|
Carlo Cabrera
|
1a992e31e8
|
Skip using Residency sets in VMs (#1537)
* Skip using Residency sets in VMs
Attempting to use residency sets in a VM throws[^1]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [metal::Device] Unable to construct residency set.
Not quite sure if this is the best fix, but it does make the error go
away.
Note that it was previously possible to run simple programs that used
mlx in a VM prior to 0eb56d5be0 . See
related discussion at Homebrew/homebrew-core#195627.
[^1]: https://github.com/Homebrew/homebrew-core/actions/runs/11525831492/job/32105148462#step:3:56
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* change residency check
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
|
2024-10-29 19:37:23 -07:00 |
|
Awni Hannun
|
015c247393
|
change wino dispatch conditoin (#1534)
|
2024-10-28 11:13:44 -07:00 |
|
Awni Hannun
|
d3cd26820e
|
Faster bits and bernoulli (#1535)
* faster bits and bernoulli
* fix bernoulli
|
2024-10-28 11:11:00 -07:00 |
|
Awni Hannun
|
8e88e30d95
|
BFS graph evaluation order (#1525)
* bfs order
* try fix event issue
|
2024-10-25 10:27:19 -07:00 |
|
Awni Hannun
|
0eb56d5be0
|
Wired (#1510)
* expose residency sets as wire/unwire
* returns wired size
* fix
* runtime support check
* fix os check
* fix test
* fix no metal build
* docs
* nit
* nits in docs
* nits
|
2024-10-25 09:35:33 -07:00 |
|
Awni Hannun
|
dad1b00b13
|
fix (#1523)
|
2024-10-24 19:17:46 -07:00 |
|
Angelos Katharopoulos
|
c9b41d460f
|
Working 64-bit scans (#1506)
|
2024-10-24 11:05:46 -07:00 |
|
xnorai
|
32972a5924
|
C++20 compatibility for fmt (#1519)
* C++20 compatibility for fmt
* Address review feedback
* Remove stray string
* Add newlines back
|
2024-10-24 08:54:51 -07:00 |
|