Awni Hannun
|
29a620cab2
|
No reshapes in quantized embedding (#1682)
* no reshapes in quantized embedding
* fix inadvertant cast
* add tol
|
2024-12-09 18:57:38 -08:00 |
|
Cheng
|
87d7a2520e
|
Use Py_ssize_t in python bindings (#1678)
* Use Py_ssize_t in python bindings
* Args passed to std::max must be same type
|
2024-12-09 12:59:19 -08:00 |
|
Awni Hannun
|
40c62c1321
|
Use int64 stride everywhere (#1671)
* use int64 stride everywhere
* fix ext
* fix ext
* more shape + cleanup
* one more
* few more
|
2024-12-09 11:09:02 -08:00 |
|
Awni Hannun
|
35b412c099
|
Fix compile hasher for string constants. (#1677)
* fix hash
* add test
* nit
|
2024-12-09 09:26:18 -08:00 |
|
Cheng
|
d0f471cff7
|
Using math defines requires switch in MSVC (#1665)
* Using math defines requires switch in MSVC
* Fix more math macros
* Fix type
* Remove _MSC_VER guard for math defines
|
2024-12-08 08:16:28 -08:00 |
|
Cheng
|
6f316b8bf5
|
Use int64_t instead of ssize_t (#1673)
|
2024-12-07 20:10:44 -08:00 |
|
Cheng
|
7c10c93a1f
|
Convert filesystem path to std::string explicitly (#1672)
|
2024-12-07 20:10:06 -08:00 |
|
Cheng
|
d92ea094f1
|
Use && instead of and (#1663)
* Use && instead of and
* Remove "and" in ops.cpp
|
2024-12-07 18:26:39 -08:00 |
|
Cheng
|
6ae5423b4a
|
Do not pass integers to isnan (#1664)
|
2024-12-07 18:26:23 -08:00 |
|
Cheng
|
9635cffdc8
|
Include io.h in MSVC for IO functions (#1661)
|
2024-12-07 18:26:06 -08:00 |
|
Cheng
|
96986fb362
|
Use auto* for pointers (#1662)
|
2024-12-07 18:25:40 -08:00 |
|
Cheng
|
3ceb341a75
|
Use correct complex type for MSVC (#1660)
|
2024-12-07 18:25:22 -08:00 |
|
Awni Hannun
|
50fa705125
|
patch bump (#1656)
|
2024-12-06 13:16:19 -08:00 |
|
Awni Hannun
|
69a2991614
|
allow compiling lambdas in C++ (#1650)
* allow compiling lambdas in C++
* fix test
* more tests
* auto detect capture-less lambda
|
2024-12-06 13:13:21 -08:00 |
|
mt_caret
|
fd3377dd1f
|
Support bias correction in Adam and AdamW optimizers (#1640)
|
2024-12-06 12:13:34 -08:00 |
|
Awni Hannun
|
d0b6cb0425
|
More primitives for compiling with shapeless (#1653)
* more shapeless and more Shape
* more shape
* fix
* fix
|
2024-12-06 11:29:18 -08:00 |
|
Alex Barron
|
95c4a2e3af
|
add back conditionaltype (#1655)
|
2024-12-06 11:12:01 -08:00 |
|
Awni Hannun
|
bc2a29f033
|
fix (#1654)
|
2024-12-06 10:48:58 -08:00 |
|
Nripesh Niketan
|
3bb5b4a302
|
Chore: Add default language in pre-commit and bump hooks (#1652)
|
2024-12-06 07:54:29 -08:00 |
|
Awni Hannun
|
fc88fd9097
|
Shape and Strides 1 / N (#1645)
* shape and stride type def
* more shape
|
2024-12-05 12:53:43 -08:00 |
|
Awni Hannun
|
c5b0928c1f
|
fix fallback (#1646)
|
2024-12-05 11:59:53 -08:00 |
|
Awni Hannun
|
e047fd977d
|
compile changes if stream changes (#1644)
|
2024-12-03 14:37:44 -08:00 |
|
Jagrit Digani
|
9d40e521d7
|
Stop matrix copies with new attention kernel (#1639)
|
2024-12-02 14:12:38 -08:00 |
|
Alex Barron
|
1445dcaa60
|
let class predicate specify quantization parameters (#1638)
|
2024-12-02 14:09:28 -08:00 |
|
Jesper Stemann Andersen
|
e4eeb4e910
|
Added missing unordered_map includes (#1635)
* Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD
|
2024-12-02 07:03:03 -08:00 |
|
Awni Hannun
|
aa86876813
|
fix transformer decoder post norm LN (#1637)
|
2024-12-02 07:02:17 -08:00 |
|
Jesper Stemann Andersen
|
974bb54ab2
|
CMake: Enabled using Accelerate on x86_64 / x64 (#1625)
* CMake: Enabled using Accelerate on x86_64 / x64
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
* CMake: Removed superfluous MLX_BUILD_ARM
|
2024-11-28 10:55:45 -08:00 |
|
Ikko Eltociear Ashimine
|
9bc2183a31
|
docs: update device.cpp (#1632)
unecessary -> unnecessary
|
2024-11-27 20:58:26 -08:00 |
|
Awni Hannun
|
d4b222b6d3
|
Fix some leaks and races (#1629)
* fix leak and fix potential race
* more leak fixes
* fix one more
|
2024-11-27 20:01:20 -08:00 |
|
Jesper Stemann Andersen
|
af2af818a6
|
Enables build for *-linux-musl (#1627)
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:24 -08:00 |
|
Jesper Stemann Andersen
|
698e63a608
|
CMake: Build with dlfcn-win32 to have dlopen etc. on win32 (#1628)
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
|
2024-11-27 13:14:13 -08:00 |
|
Awni Hannun
|
211411faf2
|
fix large ops (#1620)
|
2024-11-24 09:17:10 -08:00 |
|
Awni Hannun
|
bb303c45a5
|
version (#1617)
|
2024-11-22 12:00:03 -08:00 |
|
Alex Barron
|
6f7986d592
|
Cleaner qmv /qvm (#1616)
|
2024-11-22 11:14:08 -08:00 |
|
Awni Hannun
|
7cbb4aef17
|
Doc fix (#1615)
|
2024-11-22 11:12:25 -08:00 |
|
Jagrit Digani
|
02bec0bb6d
|
Matrix Attention kernel (#1610)
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
|
2024-11-22 10:34:05 -08:00 |
|
Alex Barron
|
c79f6a4a8c
|
3 and 6 bit quantization (#1613)
* Support 3 and 6 bit quantization
|
2024-11-22 10:22:13 -08:00 |
|
Awni Hannun
|
0c5eea226b
|
Reduce specializations (#1607)
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
|
2024-11-21 19:53:00 -08:00 |
|
Awni Hannun
|
dcca0d7477
|
contiguous op / prim (#1612)
|
2024-11-21 19:51:49 -08:00 |
|
Cocoa
|
0d5e7716ad
|
fix typo: accross -> across (#1609)
Signed-off-by: Cocoa <i@uwucocoa.moe>
|
2024-11-20 15:30:51 -08:00 |
|
Angelos Katharopoulos
|
d8c824c594
|
Formatting fixes (#1606)
|
2024-11-20 15:30:36 -08:00 |
|
Saanidhya
|
cb431dfc9f
|
Adds 3D pooling (#1526)
|
2024-11-19 16:45:24 -08:00 |
|
Awni Hannun
|
61d787726a
|
Fix view scalar bug segfault (#1603)
* fix view scalar bug
* fix view scalar bug
* one more fix
|
2024-11-19 10:54:05 -08:00 |
|
Angelos Katharopoulos
|
5e89aace9b
|
Fix concatenate vmap (#1600)
|
2024-11-19 10:44:04 -08:00 |
|
Awni Hannun
|
2af7e8a9a6
|
fix cmake version (#1601)
|
2024-11-19 08:45:05 -08:00 |
|
Awni Hannun
|
2419edd5b2
|
Faster indexing math in a few kernels (#1589)
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
|
2024-11-18 19:52:00 -08:00 |
|
Awni Hannun
|
bf481e8e5d
|
Fix sibling leak (#1590)
* add test
* fix + test
* fix fix
|
2024-11-18 19:17:01 -08:00 |
|
Awni Hannun
|
9d7fa6b8e6
|
Use osx deployment target to pick Metal version (#1595)
* choose metal based on deployment target rather than system version
* nit
* unused compile def
|
2024-11-18 19:16:49 -08:00 |
|
Angelos Katharopoulos
|
073076ac7d
|
2-Pass Sdpa Inference Kernel (#1597)
|
2024-11-18 17:31:53 -08:00 |
|
Awni Hannun
|
9bd03dd9b4
|
More buffer donation with no-ops (#1591)
* more donation
* fix test
* fix build
|
2024-11-18 08:35:41 -08:00 |
|