Cheng
|
92ab6bdeb8
|
Fix shared library not exporting symbols on Windows (#1684)
* Fix shared library not exporting symbols on Windows
* Function name style
|
2024-12-10 13:59:14 -08:00 |
|
Cheng
|
0070e360a1
|
Disable MSVC warnings (#1680)
|
2024-12-09 19:41:14 -08:00 |
|
Awni Hannun
|
40c62c1321
|
Use int64 stride everywhere (#1671)
* use int64 stride everywhere
* fix ext
* fix ext
* more shape + cleanup
* one more
* few more
|
2024-12-09 11:09:02 -08:00 |
|
Cheng
|
d0f471cff7
|
Using math defines requires switch in MSVC (#1665)
* Using math defines requires switch in MSVC
* Fix more math macros
* Fix type
* Remove _MSC_VER guard for math defines
|
2024-12-08 08:16:28 -08:00 |
|
Cheng
|
6f316b8bf5
|
Use int64_t instead of ssize_t (#1673)
|
2024-12-07 20:10:44 -08:00 |
|
Cheng
|
d92ea094f1
|
Use && instead of and (#1663)
* Use && instead of and
* Remove "and" in ops.cpp
|
2024-12-07 18:26:39 -08:00 |
|
Cheng
|
6ae5423b4a
|
Do not pass integers to isnan (#1664)
|
2024-12-07 18:26:23 -08:00 |
|
Cheng
|
9635cffdc8
|
Include io.h in MSVC for IO functions (#1661)
|
2024-12-07 18:26:06 -08:00 |
|
Cheng
|
96986fb362
|
Use auto* for pointers (#1662)
|
2024-12-07 18:25:40 -08:00 |
|
Cheng
|
3ceb341a75
|
Use correct complex type for MSVC (#1660)
|
2024-12-07 18:25:22 -08:00 |
|
Awni Hannun
|
69a2991614
|
allow compiling lambdas in C++ (#1650)
* allow compiling lambdas in C++
* fix test
* more tests
* auto detect capture-less lambda
|
2024-12-06 13:13:21 -08:00 |
|
Awni Hannun
|
d0b6cb0425
|
More primitives for compiling with shapeless (#1653)
* more shapeless and more Shape
* more shape
* fix
* fix
|
2024-12-06 11:29:18 -08:00 |
|
Alex Barron
|
95c4a2e3af
|
add back conditionaltype (#1655)
|
2024-12-06 11:12:01 -08:00 |
|
Awni Hannun
|
fc88fd9097
|
Shape and Strides 1 / N (#1645)
* shape and stride type def
* more shape
|
2024-12-05 12:53:43 -08:00 |
|
Awni Hannun
|
c5b0928c1f
|
fix fallback (#1646)
|
2024-12-05 11:59:53 -08:00 |
|
Awni Hannun
|
e047fd977d
|
compile changes if stream changes (#1644)
|
2024-12-03 14:37:44 -08:00 |
|
Jagrit Digani
|
9d40e521d7
|
Stop matrix copies with new attention kernel (#1639)
|
2024-12-02 14:12:38 -08:00 |
|
Jesper Stemann Andersen
|
e4eeb4e910
|
Added missing unordered_map includes (#1635)
* Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD
|
2024-12-02 07:03:03 -08:00 |
|
Ikko Eltociear Ashimine
|
9bc2183a31
|
docs: update device.cpp (#1632)
unecessary -> unnecessary
|
2024-11-27 20:58:26 -08:00 |
|
Awni Hannun
|
d4b222b6d3
|
Fix some leaks and races (#1629)
* fix leak and fix potential race
* more leak fixes
* fix one more
|
2024-11-27 20:01:20 -08:00 |
|
Jesper Stemann Andersen
|
af2af818a6
|
Enables build for *-linux-musl (#1627)
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:24 -08:00 |
|
Awni Hannun
|
211411faf2
|
fix large ops (#1620)
|
2024-11-24 09:17:10 -08:00 |
|
Alex Barron
|
6f7986d592
|
Cleaner qmv/qvm (#1616)
|
2024-11-22 11:14:08 -08:00 |
|
Jagrit Digani
|
02bec0bb6d
|
Matrix Attention kernel (#1610)
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
|
2024-11-22 10:34:05 -08:00 |
|
Alex Barron
|
c79f6a4a8c
|
3 and 6 bit quantization (#1613)
* Support 3 and 6 bit quantization
|
2024-11-22 10:22:13 -08:00 |
|
Awni Hannun
|
0c5eea226b
|
Reduce specializations (#1607)
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
|
2024-11-21 19:53:00 -08:00 |
|
Awni Hannun
|
dcca0d7477
|
contiguous op / prim (#1612)
|
2024-11-21 19:51:49 -08:00 |
|
Awni Hannun
|
61d787726a
|
Fix view scalar bug segfault (#1603)
* fix view scalar bug
* fix view scalar bug
* one more fix
|
2024-11-19 10:54:05 -08:00 |
|
Angelos Katharopoulos
|
5e89aace9b
|
Fix concatenate vmap (#1600)
|
2024-11-19 10:44:04 -08:00 |
|
Awni Hannun
|
2419edd5b2
|
Faster indexing math in a few kernels (#1589)
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
|
2024-11-18 19:52:00 -08:00 |
|
Awni Hannun
|
bf481e8e5d
|
Fix sibling leak (#1590)
* add test
* fix + test
* fix fix
|
2024-11-18 19:17:01 -08:00 |
|
Awni Hannun
|
9d7fa6b8e6
|
Use osx deployment target to pick Metal version (#1595)
* choose metal based on deployment target rather than system version
* nit
* unused compile def
|
2024-11-18 19:16:49 -08:00 |
|
Angelos Katharopoulos
|
073076ac7d
|
2-Pass Sdpa Inference Kernel (#1597)
|
2024-11-18 17:31:53 -08:00 |
|
Awni Hannun
|
9bd03dd9b4
|
More buffer donation with no-ops (#1591)
* more donation
* fix test
* fix build
|
2024-11-18 08:35:41 -08:00 |
|
Awni Hannun
|
6931f84412
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
|
xnorai
|
16ec0556a0
|
Allocate raw JSON metadata buffer on the heap, and limit its size (#1596)
* Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors
|
2024-11-18 07:22:51 -08:00 |
|
Awni Hannun
|
610af352d4
|
Dispatch bf16 at run time when using the JIT (#1584)
* Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h
|
2024-11-15 16:54:36 -08:00 |
|
Awni Hannun
|
b35f1e3c9c
|
fix donation in sdpa (#1587)
|
2024-11-13 17:21:13 -08:00 |
|
Awni Hannun
|
dfa0b9aab4
|
Cpu fast quantize (#1578)
* cpu quantize
* fix
|
2024-11-08 20:10:39 -08:00 |
|
Alex Barron
|
a4c47b0276
|
OOB QMV fix (#1579)
* fix oob access in qmv
* skip more
* fix small case
|
2024-11-08 17:59:45 -08:00 |
|
Alex Barron
|
111fefd5e9
|
Fix OOB access in qmv (#1577)
* fix oob access in qmv
* skip more
|
2024-11-08 15:41:30 -08:00 |
|
Awni Hannun
|
c1fe1ef081
|
Bfs width limit (#1568)
* width limit
* fix
* large limit
* put env vars in env namespace
|
2024-11-08 15:00:46 -08:00 |
|
Awni Hannun
|
91c0277356
|
fix per-example mask + docs in sdpa (#1574)
|
2024-11-08 11:51:15 -08:00 |
|
Awni Hannun
|
9f0d5c12fc
|
Fully wrap the command encoder (#1572)
* fully wrap the command encoder
* use consistent style + fix extensions
|
2024-11-08 11:50:21 -08:00 |
|
Awni Hannun
|
59247c2b62
|
add groups in conv2d (#1569)
|
2024-11-07 13:57:53 -08:00 |
|
Awni Hannun
|
9a3842a2d9
|
fix (#1566)
|
2024-11-06 17:10:33 -08:00 |
|
Awni Hannun
|
54f05e7195
|
Fix gather vmap (#1563)
* fix gather
* fix
|
2024-11-05 11:29:20 -08:00 |
|
Alex Barron
|
26be608470
|
Add split_k qvm for long context (#1564)
* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance
|
2024-11-05 11:25:19 -08:00 |
|
Angelos Katharopoulos
|
248431eb3c
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
|
Awni Hannun
|
76f275b4df
|
error in rms for wrong size (#1562)
|
2024-11-04 13:24:02 -08:00 |
|