Awni Hannun
8bae22b0fa
fix deletion of non-evaled arrays with siblings ( #1714 )
2024-12-18 18:45:36 -08:00
Alex Barron
49c34c4161
check mask type ( #1721 )
2024-12-18 14:25:18 -08:00
Awni Hannun
5548fcc96d
fix synch race ( #1719 )
2024-12-18 12:25:16 -08:00
Cheng
070bd433ab
Shorter kernel name for Windows ( #1701 )
...
* Shorter kernel name for Windows
* Only hash the clipped part
2024-12-17 18:51:38 -08:00
Cheng
c8fb54951a
Define NOMINMAX before windows.h ( #1715 )
2024-12-17 18:51:24 -08:00
Awni Hannun
d03c01dfbc
fix unflatten vjp ( #1708 )
2024-12-16 18:37:57 -08:00
Jesper Stemann Andersen
a82996e9fb
io/load: Enabled pread implementation for mingw32 ( #1706 )
2024-12-16 07:20:45 -08:00
Cheng
4768c61b57
Make sure gguf_ctx is closed when error happens ( #1699 )
2024-12-13 19:50:19 -08:00
Cheng
635117c5d4
Read/write files in binary mode ( #1698 )
2024-12-13 17:37:05 -08:00
Awni Hannun
50f3535693
Use expand_dims / unflatten / etc in more places ( #1696 )
...
* use expand_dims / unflatten in a couple more places
* few more
* few more
* fix
2024-12-12 17:00:44 -08:00
Awni Hannun
9111999af3
Fix small sort with metal validation ( #1695 )
2024-12-12 09:21:45 -08:00
Awni Hannun
6bd28d246e
Allow no copy negative strides in as_strided and slice ( #1688 )
...
* allow no copy negative strides in as_strided and slice
* fix jit
* fix jit
2024-12-12 08:59:45 -08:00
Cheng
4d595a2a39
Make compiled preamble work in MSVC ( #1675 )
...
* Make compiled preamble work in MSVC
* Remove logging
* Only use powershell for MSVC
2024-12-12 08:55:49 -08:00
Awni Hannun
4e1e9520e1
Flatten and unflatten ( #1692 )
...
* flatten and unflatten
* fix grad
* fix shape infer
* use squeeze + unsqueeze in get_item
2024-12-11 21:51:37 -08:00
Awni Hannun
f3dfa36a3a
Fix x86 tests ( #1691 )
...
* fix x86 tests
* comment
2024-12-11 07:47:18 -08:00
Awni Hannun
f76a49e555
ExpandDims
primitive (#1687 )
...
* add squeeze primitive
* simplify squeeze, use in gather
* fix
* fix
* fix
* fix
* fix no cpu
* use squeeze in matmul and friends
* expand dims primitive
* comment
2024-12-10 16:39:07 -08:00
Cheng
56db268f47
Provide a pread implementation for MSVC ( #1666 )
2024-12-10 15:55:53 -08:00
Cheng
92ab6bdeb8
Fix shared library not exporting symbols on Windows ( #1684 )
...
* Fix shared library not exporting symbols on Windows
* Function name style
2024-12-10 13:59:14 -08:00
Cheng
0070e360a1
Disable MSVC warnings ( #1680 )
2024-12-09 19:41:14 -08:00
Awni Hannun
40c62c1321
Use int64 stride everywhere ( #1671 )
...
* use int64 stride everywhere
* fix ext
* fix ext
* more shape + cleanup
* one more
* few more
2024-12-09 11:09:02 -08:00
Cheng
d0f471cff7
Using math defines requires switch in MSVC ( #1665 )
...
* Using math defines requires switch in MSVC
* Fix more math macros
* Fix type
* Remove _MSC_VER guard for math defines
2024-12-08 08:16:28 -08:00
Cheng
6f316b8bf5
Use int64_t instead of ssize_t ( #1673 )
2024-12-07 20:10:44 -08:00
Cheng
d92ea094f1
Use && instead of and ( #1663 )
...
* Use && instead of and
* Remove "and" in ops.cpp
2024-12-07 18:26:39 -08:00
Cheng
6ae5423b4a
Do not pass integers to isnan ( #1664 )
2024-12-07 18:26:23 -08:00
Cheng
9635cffdc8
Include io.h in MSVC for IO functions ( #1661 )
2024-12-07 18:26:06 -08:00
Cheng
96986fb362
Use auto* for pointers ( #1662 )
2024-12-07 18:25:40 -08:00
Cheng
3ceb341a75
Use correct complex type for MSVC ( #1660 )
2024-12-07 18:25:22 -08:00
Awni Hannun
69a2991614
allow compiling lambdas in C++ ( #1650 )
...
* allow compiling lambdas in C++
* fix test
* more tests
* auto detect capture-less lambda
2024-12-06 13:13:21 -08:00
Awni Hannun
d0b6cb0425
More primitives for compiling with shapeless ( #1653 )
...
* more shapeless and more Shape
* more shape
* fix
* fix
2024-12-06 11:29:18 -08:00
Alex Barron
95c4a2e3af
add back conditionaltype ( #1655 )
2024-12-06 11:12:01 -08:00
Awni Hannun
fc88fd9097
Shape and Strides 1 / N ( #1645 )
...
* shape and stride type def
* more shape
2024-12-05 12:53:43 -08:00
Awni Hannun
c5b0928c1f
fix fallback ( #1646 )
2024-12-05 11:59:53 -08:00
Awni Hannun
e047fd977d
compile changes if stream changes ( #1644 )
2024-12-03 14:37:44 -08:00
Jagrit Digani
9d40e521d7
Stop matrix copies with new attention kernel ( #1639 )
2024-12-02 14:12:38 -08:00
Jesper Stemann Andersen
e4eeb4e910
Added missing unordered_map includes ( #1635 )
...
* Added missing includes in mlx/io.h and mlx/backend/metal/metal.h
* Added additional missing unordered_map includes that fixes build on FreeBSD
2024-12-02 07:03:03 -08:00
Ikko Eltociear Ashimine
9bc2183a31
docs: update device.cpp ( #1632 )
...
unecessary -> unnecessary
2024-11-27 20:58:26 -08:00
Awni Hannun
d4b222b6d3
Fix some leaks and races ( #1629 )
...
* fix leak and fix potential race
* more leak fixes
* fix one more
2024-11-27 20:01:20 -08:00
Jesper Stemann Andersen
af2af818a6
Enables build for *-linux-musl ( #1627 )
...
Also contributes to being able to build for *-w64-mingw32.
Cf. https://github.com/JuliaPackaging/Yggdrasil/pull/9761
2024-11-27 13:14:24 -08:00
Awni Hannun
211411faf2
fix large ops ( #1620 )
2024-11-24 09:17:10 -08:00
Alex Barron
6f7986d592
Cleaner qmv
/qvm
( #1616 )
2024-11-22 11:14:08 -08:00
Jagrit Digani
02bec0bb6d
Matrix Attention kernel ( #1610 )
...
* Rough INIT
* [WIP]: Loading and Matmuls added
* [WIP]: Reductions and min working aligned kernel at headdim = 64
* [WIP] Added headdim 80 for testing
* [WIP] Update dispatch params for testing
* [WIP] Add support for unaligned seq lengths - still looks messy
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Update sdpa_benchmarks
* Enable gqa support
* Update benchmark and switch off 128 headdim
* Update headdim 128 tuning
* Remove older fast attention code. Write out O strided
* Disable hd=128 until further optimizations
* Enable bf16
* Fix data size bug
* Enable attn build outside of jit
2024-11-22 10:34:05 -08:00
Alex Barron
c79f6a4a8c
3 and 6 bit quantization ( #1613 )
...
* Support 3 and 6 bit quantization
2024-11-22 10:22:13 -08:00
Awni Hannun
0c5eea226b
Reduce specializations ( #1607 )
...
* start of reduce specializations
* fix all reduce
* fix many dims
* fix
* non-jit tests clear
* cleanup instantiations
* cpu merges
* change dim specializations
* optimize
* fix jit
* fix jit
* use higher precision for integer sum+prod
* fixes
2024-11-21 19:53:00 -08:00
Awni Hannun
dcca0d7477
contiguous op / prim ( #1612 )
2024-11-21 19:51:49 -08:00
Awni Hannun
61d787726a
Fix view scalar bug segfault ( #1603 )
...
* fix view scalar bug
* fix view scalar bug
* one more fix
2024-11-19 10:54:05 -08:00
Angelos Katharopoulos
5e89aace9b
Fix concatenate vmap ( #1600 )
2024-11-19 10:44:04 -08:00
Awni Hannun
2419edd5b2
Faster indexing math in a few kernels ( #1589 )
...
* wip: faster compiled kernels
* faster general unary with uint specialization
* index type in compiled, unary, binary, ternary, copy
* fix jit
* jit fix
* specialize gather + scatter
* nit in docs
2024-11-18 19:52:00 -08:00
Awni Hannun
bf481e8e5d
Fix sibling leak ( #1590 )
...
* add test
* fix + test
* fix fix
2024-11-18 19:17:01 -08:00
Awni Hannun
9d7fa6b8e6
Use osx deployment target to pick Metal version ( #1595 )
...
* choose metal based on deployment target rather than system version
* nit
* unused compile def
2024-11-18 19:16:49 -08:00
Angelos Katharopoulos
073076ac7d
2-Pass Sdpa Inference Kernel ( #1597 )
2024-11-18 17:31:53 -08:00
Awni Hannun
9bd03dd9b4
More buffer donation with no-ops ( #1591 )
...
* more donation
* fix test
* fix build
2024-11-18 08:35:41 -08:00
Awni Hannun
6931f84412
fix dispatch threads for a few kernels ( #1594 )
2024-11-18 08:35:25 -08:00
xnorai
16ec0556a0
Allocate raw JSON metadata buffer on the heap, and limit its size ( #1596 )
...
* Allocate raw JSON metadata buffer on the heap, and limit its size to 1GiB
* Set the upper size limit for the header to 100K as in Rust safetensors
2024-11-18 07:22:51 -08:00
Awni Hannun
610af352d4
Dispatch bf16 at run time when using the JIT ( #1584 )
...
* Dispatch bf16 at run time when using the JIT
* fix extension
* fix extension build
* fix extension build
* Update utils.h
2024-11-15 16:54:36 -08:00
Awni Hannun
b35f1e3c9c
fix donation in sdpa ( #1587 )
2024-11-13 17:21:13 -08:00
Awni Hannun
dfa0b9aab4
Cpu fast quantize ( #1578 )
...
* cpu quantize
* fix
2024-11-08 20:10:39 -08:00
Alex Barron
a4c47b0276
OOB QMV fix ( #1579 )
...
* fix oob access in qmv
* skip more
* fix small case
2024-11-08 17:59:45 -08:00
Alex Barron
111fefd5e9
Fix OOB access in qmv ( #1577 )
...
* fix oob access in qmv
* skip more
2024-11-08 15:41:30 -08:00
Awni Hannun
c1fe1ef081
Bfs width limit ( #1568 )
...
* width limit
* fix
* large limit
* put env vars in env namespace
2024-11-08 15:00:46 -08:00
Awni Hannun
91c0277356
fix per-example mask + docs in sdpa ( #1574 )
2024-11-08 11:51:15 -08:00
Awni Hannun
9f0d5c12fc
Fully wrap the command encoder ( #1572 )
...
* fully wrap the command encoder
* use consistent style + fix extensions
2024-11-08 11:50:21 -08:00
Awni Hannun
59247c2b62
add groups in conv2d ( #1569 )
2024-11-07 13:57:53 -08:00
Awni Hannun
9a3842a2d9
fix ( #1566 )
2024-11-06 17:10:33 -08:00
Awni Hannun
54f05e7195
Fix gather vmap ( #1563 )
...
* fix gather
* fix
2024-11-05 11:29:20 -08:00
Alex Barron
26be608470
Add split_k qvm
for long context ( #1564 )
...
* Add splitk qvm
* configurable splitk
* tuning
* remove extra instantiation
* remove refactor
* separate test
* cpu tolerance
2024-11-05 11:25:19 -08:00
Angelos Katharopoulos
248431eb3c
Reductions update ( #1351 )
2024-11-04 22:25:16 -08:00
Awni Hannun
76f275b4df
error in rms for wrong size ( #1562 )
2024-11-04 13:24:02 -08:00
Awni Hannun
f1951d6cce
Use fewer barriers ( #1561 )
...
* use fewer barriers
* comment
2024-11-04 10:26:49 -08:00
Angelos Katharopoulos
62f297b51d
Sdpa fix ( #1558 )
2024-11-02 21:25:46 -07:00
Awni Hannun
57c6aa7188
fix multi output leak ( #1548 )
2024-10-31 09:32:01 -07:00
Awni Hannun
4f72c66911
improvements to scatter / gather ( #1541 )
2024-10-30 19:30:54 -07:00
Jagrit Digani
960e3f0f05
Gemm update ( #1518 )
2024-10-30 19:30:28 -07:00
Awni Hannun
884af42da2
Fix thread group for large arrays ( #1543 )
...
* fix thread group for large arrays
* comment
* one more
2024-10-30 16:25:12 -07:00
Alex Barron
048fabdabd
Fix vmap constant output size ( #1524 )
...
* use inputs to determine output size
* remove noop vmap tests
2024-10-30 16:16:53 -07:00
Carlo Cabrera
1a992e31e8
Skip using Residency sets in VMs ( #1537 )
...
* Skip using Residency sets in VMs
Attempting to use residency sets in a VM throws[^1]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [metal::Device] Unable to construct residency set.
Not quite sure if this is the best fix, but it does make the error go
away.
Note that it was previously possible to run simple programs that used
mlx in a VM prior to 0eb56d5be0
. See
related discussion at Homebrew/homebrew-core#195627 .
[^1]: https://github.com/Homebrew/homebrew-core/actions/runs/11525831492/job/32105148462#step:3:56
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* change residency check
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-29 19:37:23 -07:00
Awni Hannun
015c247393
change wino dispatch conditoin ( #1534 )
2024-10-28 11:13:44 -07:00
Awni Hannun
d3cd26820e
Faster bits and bernoulli ( #1535 )
...
* faster bits and bernoulli
* fix bernoulli
2024-10-28 11:11:00 -07:00
Awni Hannun
8e88e30d95
BFS graph evaluation order ( #1525 )
...
* bfs order
* try fix event issue
2024-10-25 10:27:19 -07:00
Awni Hannun
0eb56d5be0
Wired ( #1510 )
...
* expose residency sets as wire/unwire
* returns wired size
* fix
* runtime support check
* fix os check
* fix test
* fix no metal build
* docs
* nit
* nits in docs
* nits
2024-10-25 09:35:33 -07:00
Awni Hannun
dad1b00b13
fix ( #1523 )
2024-10-24 19:17:46 -07:00
Angelos Katharopoulos
c9b41d460f
Working 64-bit scans ( #1506 )
2024-10-24 11:05:46 -07:00
xnorai
32972a5924
C++20 compatibility for fmt ( #1519 )
...
* C++20 compatibility for fmt
* Address review feedback
* Remove stray string
* Add newlines back
2024-10-24 08:54:51 -07:00
Dhruv Govil
f6afb9c09b
Remove use of vector<const T> ( #1514 )
2024-10-22 16:31:52 -07:00
Kashif Rasul
3ddc07e936
Eigenvalues and eigenvectors ( #1334 )
...
* initial eigvalsh
* add compute_vectors
* add compute_vectors_
* return a pair
* add eigh to return only eigenvectors
* fixed typo
* merge merge Eighvalsh and Eigh into a single primitive
* use the same primate with the flag
* fix primatives
* use MULTI
* fix eval_gpu
* fix decleration
* rename EighPrimitive to Eigh
* tests
* tests
* fix rebase and format
* cleanup lapack
* format
* add cblas.h
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-10-22 12:18:48 -07:00
Awni Hannun
c26208f67d
Remove Hazard tracking with Fences ( #1509 )
...
* remove hazard tracking
* with fence map
* no hazard tracking with fences
* nits
* fix fence retain
* cleanup
* fix quantized rebase
2024-10-21 19:33:32 -07:00
Alex Barron
d15fa13daf
Batched Quantized Matmul + Fast Small QMV ( #1503 )
...
* add fast qmv for small dims
* fix test
* batched cpu
* add batched template param
* refactor metal quantized.cpp
2024-10-21 16:23:17 -07:00
Awni Hannun
92d7cb71f8
Fix compile ( #1501 )
...
* fix compile
* fix space
2024-10-18 11:06:40 -07:00
Angelos Katharopoulos
50d8bed468
Fused attention for single query ( #1497 )
2024-10-18 00:58:52 -07:00
Awni Hannun
3f86399922
Real and Imag ( #1490 )
...
* real and imag
* fix
* fix
2024-10-15 16:23:15 -07:00
Awni Hannun
0ab8e099e8
Fix cpu segfault ( #1488 )
...
* fix cpu segfault
* nit in tests
2024-10-14 16:17:03 -07:00
Awni Hannun
020f048cd0
A few updates for CPU ( #1482 )
...
* some updates
* format
* fix
* nit
2024-10-14 12:45:49 -07:00
Awni Hannun
881615b072
Faster metal compiled kernels + some fixes ( #1486 )
...
* bump mac tests to use py39
* work per thread for compiled kernels
* fixe for large arrays
* fix
2024-10-14 12:45:38 -07:00
Awni Hannun
bf6ec92216
Make the GPU device more thread safe ( #1478 )
...
* gpu stream safety
* comment
* fix
2024-10-12 17:49:15 -07:00
Awni Hannun
e1c9600da3
Add mx.random.permutation
( #1471 )
...
* random permutation
* comment
2024-10-08 19:42:19 -07:00
Awni Hannun
1fa0d20a30
consistently handle all -inf in softmax ( #1470 )
2024-10-08 09:54:02 -07:00
Awni Hannun
3274c6a087
Fix array is_available race cases ( #1468 )
2024-10-07 19:13:50 -07:00
Angelos Katharopoulos
9b12093739
Add the roll op ( #1455 )
2024-10-07 17:21:42 -07:00
Awni Hannun
0070e1db40
Fix deep recursion with siblings ( #1462 )
...
* fix recursion with siblings
* fix
* add test
* increase tol
2024-10-07 06:15:33 -07:00
Awni Hannun
95d04805b3
Fix complex power on Metal ( #1460 )
2024-10-06 19:58:30 -07:00
Awni Hannun
e4534dac17
Conv grad with groups + bugfix ( #1449 )
...
* fix bug in flipped conv with groups, start of grad for groups
* fix
* fix
* fix + test
2024-10-06 07:08:53 -07:00
Awni Hannun
1bdc038bf9
fix argpartition + faster {arg} sorts / partitions ( #1453 )
2024-10-03 14:21:25 -07:00
Awni Hannun
5523d9c426
faster cpu indexing ( #1450 )
2024-10-03 13:53:47 -07:00
Angelos Katharopoulos
d878015228
Fix normalization check_input ( #1452 )
2024-10-03 13:26:56 -07:00
Angelos Katharopoulos
bacced53d3
Fix row reduce with very few rows ( #1447 )
2024-09-29 20:00:35 -07:00
Awni Hannun
11354d5bff
Avoid io timeout for large arrays ( #1442 )
2024-09-27 13:32:14 -07:00
Awni Hannun
718aea3f1d
allow take to work with integer index ( #1440 )
2024-09-26 15:58:03 -07:00
Awni Hannun
5b6f38df2b
Faster cpu ops ( #1434 )
...
* faster binary and cleaner copy
* use recursive template for other ops
* more cleanup
* fix from cleanup
* more clean
* fix binary
* use contiguous iterator
* add 3d
* nits
* fix
* fix?
* fix
* fix rebase
2024-09-26 09:19:13 -07:00
Awni Hannun
0b4a58699e
Some overhead reductions in mx.fast.metal_kernel ( #1437 )
...
* some overhead reductions
* fix
* use +=
* use more +=
2024-09-25 17:25:21 -07:00
Awni Hannun
4f9f9ebb6f
Faster Metal unary and binary for general case ( #1431 )
...
* faster unary and binary for general case
* update ternary + jit fix
* fix jit
* unary work per thread
2024-09-25 12:07:43 -07:00
Awni Hannun
afc9c0ec1b
dtype is copy assignable ( #1436 )
2024-09-25 12:07:13 -07:00
Awni Hannun
195b429d99
Put along axis + fixe for partition grad ( #1430 )
...
* put along axis, fixes for partition grad
* zeros for arg reduce
2024-09-23 10:03:38 -07:00
Awni Hannun
67b6bf530d
Optimization for general ND copies ( #1421 )
2024-09-17 17:59:51 -07:00
Nripesh Niketan
6af5ca35b2
feat: add cross_product ( #1252 )
...
* feat: add cross_product
* lint
* python binding
* refactor: Improve error message for cross_product function
* refactor: more close to numpy cross product
* refactor: improve error message for cross_product function
* finish
* fix acks
* allow old numpy
* doc
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-17 13:12:43 -07:00
Awni Hannun
4f46e9c997
More fixes for arrays with large sizes ( #1405 )
...
* compile works for big arrays when contiguous
* style
* nits in docs
* a bunch more stuff
* update jit
* update jit
* use constant for shapes and strides and remove elem_to_loc overload
* use kernel instantiation
* docs nits
* update binary and ternary
* comments
2024-09-17 12:46:31 -07:00
Angelos Katharopoulos
914409fef9
Data parallel helper ( #1407 )
2024-09-16 18:17:21 -07:00
Nripesh Niketan
669c27140d
Chore: add pre-commit hook for cmake ( #1362 )
...
* reset and lint
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-16 12:53:01 -07:00
Max-Heinrich Laves
adcc88e208
Conv cpu improvements ( #1410 )
2024-09-15 18:45:10 -07:00
Awni Hannun
d6492b0163
fix clip ( #1415 )
2024-09-14 16:09:09 -07:00
Awni Hannun
b3f52c9fbe
ensure io/comm streams are active before eval ( #1412 )
2024-09-14 06:17:36 -07:00
Angelos Katharopoulos
881f09b2e2
Allow querying the allocator for the buffer size ( #1404 )
2024-09-11 21:02:16 -07:00
Awni Hannun
02efb310ca
Xcode 160 ( #1384 )
...
* xcode 16.0 with debug tests
* limit nproc for builds
* vmap bug
* assert bug
* run python tests in debug mode
* fix view, bool copies preserve bits'
* actual view fix
2024-09-10 15:15:17 -07:00
Awni Hannun
e7e59c6f05
Fix copying scalars by adding fill_gpu ( #1402 )
...
* fix copying scalars by adding fill_gpu
* Another copy scalar changed to fill
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-09-09 15:54:08 -07:00
Awni Hannun
3ae6aabe9f
throw for certain cases of non captured inputs in compile ( #1401 )
2024-09-09 14:54:31 -07:00
xnorai
dc627dcb5e
Replace the use of result_of_t
with invoke_result_t
( #1397 )
...
* Fix C++20 incompatibility
* Fix C++20 incompatibility
2024-09-06 19:52:57 -07:00
Max-Heinrich Laves
efeb9c0f02
Transposed Convolution ( #1245 )
...
* initial implementation for conv_transpose
ran pre-commit
implemented conv_transpose
updated conv_general docstring
updated conv_general docstring
updated code comments
removed commented run_conv_checks
updated acknowledgments
added missing entry to ops.rst
added op to nn.layers
resolved merge conflicts
* removed ConvolutionTranspose primitive as suggested by reviewer
removed ConvolutionTranspose primitive as suggested by reviewer
* remove transpose flag, add another test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-06 19:52:38 -07:00
Awni Hannun
ba3e913c7a
Simplifications for MLX C ( #1396 )
...
* simplifications for MLX C
* use vectors instead of map
* update examples
2024-09-06 19:16:50 -07:00
Awni Hannun
7cca1727af
Fix slice data size ( #1394 )
...
* fix slice data size and add tests
* fix contiguous flag
* simplify stride and perform copy for non-contiguous arrays
* fix cpu
* comment
2024-09-04 19:10:43 -07:00
Awni Hannun
41c603d48a
fix jit reduce ( #1395 )
2024-09-04 14:03:10 -07:00
Angelos Katharopoulos
969337345f
Fix reduce edge case ( #1389 )
2024-09-01 21:37:51 -07:00
Angelos Katharopoulos
58dca7d846
Fix copy in the sort primitive ( #1383 )
2024-08-31 08:32:14 -07:00
Awni Hannun
0d302cd25b
Fix compiel with byte sized constants ( #1381 )
2024-08-30 17:24:35 -07:00
Alex Barron
da691257ec
Fix overflow in quantize/dequantize ( #1379 )
...
* add 2d indices to prevent overflow
* use nthreads not out size
2024-08-30 13:32:41 -07:00
Awni Hannun
dba2bd1105
Even Even Faster IO ( #1374 )
...
* even more faster io
* make reader pool static
* make python reader thread safe
* one more optimization
2024-08-29 16:05:40 -07:00
Alex Barron
28be4de7c2
Fix JIT reductions ( #1373 )
2024-08-28 16:39:11 -07:00
Awni Hannun
a6c3b38fba
Async load ( #1372 )
...
* async load
* async load
2024-08-28 14:21:55 -07:00
Awni Hannun
fcb65a3897
Even Faster I/O ( #1369 )
...
* try multithreading for faster IO
* smaller batch size
* Account for pread returning less than size
* nit
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-28 11:49:07 -07:00
Jeethu Rao
bd47e1f066
Fix neon_fast_exp and add more softmax tests ( #1367 )
2024-08-27 23:42:42 -07:00
Aditya Dhulipala
e6b223df5f
Pinv ( #875 )
2024-08-27 23:06:12 -07:00
Angelos Katharopoulos
e64349bbdd
Make eval just wait if all arrays are scheduled ( #1368 )
2024-08-27 17:01:22 -07:00
Angelos Katharopoulos
cdb59faea6
Adds send/recv ops in distributed ( #1366 )
2024-08-26 23:01:37 -07:00
Alex Barron
1d94ac3f90
Add optional headers to `mx.fast.metal_kernel
` ( #1358 )
2024-08-26 21:45:45 -07:00
Awni Hannun
5f7d19d1f5
MPI ops in GPU stream for faster comms ( #1356 )
2024-08-26 15:12:50 -07:00
Awni Hannun
2fdf9eb535
Fix ternary for large arrays ( #1359 )
...
* fix ternary for large arrays
* fix
2024-08-26 11:22:27 -07:00
Awni Hannun
860d3a50d7
fix extension metal library finding ( #1361 )
2024-08-26 09:18:50 -07:00
Angelos Katharopoulos
8081df79be
Fix boolean all reduce bug ( #1355 )
2024-08-24 10:09:32 -07:00
Nripesh Niketan
64bec4fad7
Chore: update pre-commit hooks ( #1353 )
...
* Chore: update pre-commit refs
* run pre-commit
2024-08-24 06:46:36 -07:00
Alex Barron
b96e105244
Add grid_sample
example to metal_kernel
docs ( #1352 )
...
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`
* add grid sample to docs
* zero_outputs -> init_value
* add missing header for linux
2024-08-23 18:24:16 -07:00
Angelos Katharopoulos
b57a52813b
Further reduction tuning ( #1349 )
...
* More reduction tuning
* Forgotten pdb
* Small column long row specialization
2024-08-23 10:35:25 -07:00
Alex Barron
da8deb2b62
fix bug with multiple attributes ( #1348 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-08-23 10:06:15 -07:00
Awni Hannun
98b6ce3460
Refactor reductions and fix scatter atomics for large sizes ( #1300 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-22 16:03:31 -07:00
Alex Barron
0fd2a1f4b0
Custom Metal Kernels from Python ( #1325 )
...
* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only
2024-08-22 13:46:29 -07:00
Awni Hannun
df3233454d
2d gather specialization ( #1339 )
2024-08-22 10:48:24 -07:00
Awni Hannun
8ae751d3da
fix io ( #1343 )
...
* fix io
* fix io
* comment
2024-08-21 13:14:46 -07:00
Awni Hannun
d40e76809f
Fix rope ( #1340 )
...
* add test
* fix rope
* fix test
2024-08-20 17:37:52 -07:00
Awni Hannun
bb1b76d9dc
RoPE with frequencies as optional input ( #1337 )
...
* start rope with freq input
* rope with frequencies
* nits
* fix bug
* fix bug + test
* cleanup
* optional base
2024-08-19 18:30:50 -07:00
Angelos Katharopoulos
9d26441224
Fix contiguity check ( #1336 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-08-19 16:05:06 -07:00
Awni Hannun
f12f24a77c
fix compiling with space in paths ( #1332 )
2024-08-15 16:39:24 -07:00
Awni Hannun
d0630ffe8c
Read arrays from files faster ( #1330 )
...
* read faster
* faster write as well
* set default permission for linux
* comment
2024-08-14 20:09:56 -07:00
Alex Barron
99bb7d3a58
GPU mx.sign for complex64 ( #1326 )
2024-08-14 07:54:53 -07:00
Awni Hannun
eaaea02010
Add isfinite
( #1318 )
...
* isfinite
* remove reduce test since fix is not complete
2024-08-13 14:49:28 -07:00
Brian Keene
19fb69e2ed
Add memory_efficient_threshold kwarg to sdpa kernel ( #1319 )
...
Allows opt-in to memory efficient GPU shader at proscribed sequence
length. Otherwise, utilizes aggregate MLX primitives for best latency.
2024-08-12 12:57:09 -07:00
Alex Barron
32668a7317
CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv ( #1307 )
...
* add cholesky inv + tri inv
* always run tri_inv on cpu
* consistent naming
2024-08-08 15:18:02 -07:00
Angelos Katharopoulos
eb8819e91e
Revert variance to be numerically stable ( #1314 )
2024-08-08 13:35:02 -07:00
Awni Hannun
30bbea2f08
Add gemv masked to JIT plus some fixes ( #1310 )
...
* add gemv masked to JIT plus some fixes
* some cleanup
* add utils
* fix
* fix 2
* more cleaning
* fix
* remove unused mps matmul support
* one more nit
* revert
2024-08-07 13:38:07 -07:00
Alex Barron
635ccd9e25
Add "edge" mode to mx.pad ( #1309 )
...
* Add edge padding mode
* fix pad in pooling
* string arg instead of enum
2024-08-06 11:23:10 -07:00
nicolov
8c9f0278b9
Add vmap to scatter ( #1200 )
...
* Add vmap to scatter
* updates
* vmap updates + a few more tests
* bug fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-05 20:12:27 -07:00
Awni Hannun
58d0e199e1
add bfloat conv for windograd ( #1306 )
...
* add bfloat conv for windograd
* accumulate in fp32
* accumulate in fp32
* accumulate in bf16
2024-08-05 15:51:13 -07:00
Awni Hannun
43ffdab172
fix rope and random ( #1301 )
...
* fix rope and random
* comment
2024-07-31 16:18:25 -07:00
Awni Hannun
40b6d67333
Fixes for large arrays with a few ops ( #1299 )
...
* fixes for large arrays with a few ops
* fix bug
* fix all of copy
2024-07-30 17:18:39 -07:00
Alex Barron
c52d1600f0
Fused Affine Quantize/Dequantize ops ( #1282 )
...
* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize
2024-07-29 15:11:38 -07:00
Awni Hannun
7b456fd2c0
Array api ( #1289 )
...
* some updates for numpy 2.0 and array api
* some updates for numpy 2.0 and array api
* fix array api doc
2024-07-26 10:40:49 -07:00
Anton Belov
5029894662
[Issue #1187 ] Add nan_to_num function initial attempt ( #1247 )
...
* initial attempt, working with wrong types
* not compiling; mx.float16 and mx.bfloat16 tests added
* fix nan to num
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-07-25 09:57:37 -07:00
Awni Hannun
baf9fa5f42
Einsum ( #1269 )
...
* einsum initial
* fix comma break
* sum axis was wrong
* small cleanups
* python binding
* changed bindings to resemble numpy
* remove todo comment
* comment changes
* add count of operands/inputs
* fail fast if operands list is empty
* ignore comma if no output
* einsum path matching numpy
* getting somewhere with path
* remove print
* it passes the first test
* moved einsum tests to seperate file
* seperated einsum path
* moved einsum naive
* remove space from equation
* fast fail if no operands passed
* update tests and remove printf
* small cleanup
* some more cleanups
* removed python helper file
* ack
* utilize std for finding min in vector
* duplicate def
* remove the tuple as it was unreadable
* moved einsum_naive back to ops
* remaining isn't needed
* avoid creating another set
* cleanup
* greedy path, start of naive einsum
* more einsum
* fix some bugs
* some more fixes, tests pass
* benchmark
* some simplify
* fix einsum and test
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* add a bunch more tests and fix a bunch more bugs
* some docs nits
---------
Co-authored-by: dc-dc-dc <dgcruz983@gmail.com>
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-07-25 09:36:44 -07:00
Jagrit Digani
7f914365fd
Fix GPU sort for large arrays ( #1285 )
...
* Fix GPU sort for large arrays
2024-07-24 14:37:10 -07:00
fgranqvist
50eff6a10a
Implement sampling from laplace distribution. ( #1279 )
2024-07-24 15:15:37 +02:00
Alex Barron
c34a5ae7f7
Fix bfloat16 Hadamard ( #1283 )
...
* fix bfloat16 hadamard
* add scale
* review comments
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-07-23 14:54:43 -07:00
Awni Hannun
e2aa6ec8ae
some fixes ( #1281 )
2024-07-23 11:49:05 -07:00
Tim Gymnich
6307d166eb
Fix overflow / underflow handling for expm1f ( #1278 )
...
* Fix overflow / underflow handling for expm1f
* update tests
2024-07-23 07:29:06 -07:00
Awni Hannun
1fba87b0df
Fix leak with multi-output primitives ( #1274 )
...
* fix leak with multi-output primitives
* hopefully an actual fix
2024-07-23 06:34:18 -07:00
Awni Hannun
df124e018a
fix gguf ( #1273 )
...
* fix gguf
* comment
2024-07-18 07:35:35 -07:00
Cheng
2f83d6e4b7
Do not release buffers on exit ( #1142 )
2024-07-15 15:12:24 -07:00
Feng Shijie
987785d8d7
Fix typo and missing header ( #1266 )
2024-07-15 08:20:24 -07:00
Angelos Katharopoulos
5c1fa64fb0
Custom transforms ( #1246 )
2024-07-10 18:00:01 -07:00
Alex Barron
a3c287354f
Fast Hadamard Transform ( #1249 )
...
* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent
2024-07-09 20:39:01 -07:00
Angelos Katharopoulos
03cf033f82
Fix reshape copy bug ( #1253 )
2024-07-07 21:37:00 -07:00
Alex Barron
bdb36c9a63
add zero vjps for bitwise ops and gather w.r.t. index ( #1256 )
2024-07-07 21:34:59 -07:00
Awni Hannun
20bb301195
CPU binary reduction + Nits ( #1242 )
...
* very minor nits
* reduce binary
* fix test
2024-06-28 13:50:42 -07:00
Angelos Katharopoulos
b05bcfd27f
Fixes segfault when compiling checkpointed functions ( #1235 )
2024-06-26 16:14:45 -07:00
Alex Barron
2615660e62
Fix strided sort bug ( #1236 )
...
* Use output strides in sort kernel
* fix zero strides bug
2024-06-26 14:32:11 -07:00
Awni Hannun
5b0af4cdb1
fix donation condition for compilation ( #1237 )
2024-06-26 09:04:05 -07:00
Jagrit Digani
8c2e15e6c8
Accelerate import updates for iOS ( #1227 )
...
* Update veclib and bnns includes to #include <Accelerate/Accelerate.h> for compatibility with ios
* Mark float literals in softmax.cpp to be float16_t for errors in ios
* Add arm neon vector operation guards
* Redirect to common backend for consistency
2024-06-26 09:01:50 -07:00
Awni Hannun
56c8a33439
Get metal version from xcode ( #1228 )
...
* get metal version from xcode
* typo
* fix
2024-06-26 07:02:11 -07:00
Jagrit Digani
2d6cd47713
Masked gemv ( #1211 )
2024-06-14 09:52:26 -07:00
Awni Hannun
fe3167d7ea
smaller CPU binary ( #1203 )
...
* smaller CPU binary
* fix no cpu build
2024-06-14 09:46:55 -07:00
Awni Hannun
31e134be35
Build for macOS 15 ( #1208 )
...
* Build for macos 15
* metal32 as well
* comment
---------
Co-authored-by: Awni Hannun <Awni Hannun>
2024-06-13 13:31:44 -07:00
Fangjun Kuang
f20e97b092
minor fixes ( #1194 )
...
* minor fixes
* fix build errors
2024-06-12 22:06:49 -07:00
Alex Barron
934683088e
Refactor JIT for unary/binary/ternary ops ( #1206 )
...
* refactor unary/binary/ternary ops
* get_primitive_string util
---------
2024-06-12 14:22:12 -07:00
Awni Hannun
de2b9e7d0a
Fix kernel deps to reduce build times ( #1205 )
2024-06-12 11:17:39 -07:00
Alex Barron
dd7d8e5e29
Add Quantized Ops to the JIT ( #1204 )
...
* JIT for quantized ops
* remove unused imports
* address comments
* fix imports
* second attempt to fix imports
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-12 09:47:12 -07:00
Awni Hannun
df964132fb
fix scatter + test ( #1202 )
...
* fix scatter + test
* fix test warnings
* fix metal validation
2024-06-11 14:35:12 -07:00
Alex Barron
27d70c7d9d
Feature complete Metal FFT ( #1102 )
...
* feature complete metal fft
* fix contiguity bug
* jit fft
* simplify rader/bluestein constant computation
* remove kernel/utils.h dep
* remove bf16.h dep
* format
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-06 12:57:25 -07:00
nicolov
0e585b4409
Add docstring for scatter ( #1189 )
...
* Add docstring for scatter
* docs nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-06 11:51:25 -07:00
Angelos Katharopoulos
0163a8e57a
Add docs for the distributed namespace ( #1184 )
2024-06-06 11:37:00 -07:00
Awni Hannun
578842954c
fix jit scan when output doesn't have primitive ( #1190 )
2024-06-06 07:24:58 -07:00
Awni Hannun
496315fe1d
Fix scan ( #1188 )
...
* fix scan
* improve grid size
* fix cpu cummax
2024-06-05 14:21:58 -07:00
Awni Hannun
83b11bc58d
Fix Metal API validation for empty concat ( #1183 )
2024-06-04 13:17:08 -07:00
Alex Barron
375a8bbdcc
Add some internal GPU apis ( #1177 )
...
* Add unary/binary/ternay/slice/concat internal GPU ops
* add pad internal op
* formatting + no_cpu fix
2024-06-04 09:24:26 -07:00
Awni Hannun
ea9090bbc4
Add view op ( #1179 )
...
* add view primitive
* nit
* fix view
2024-06-04 08:05:27 -07:00
Angelos Katharopoulos
3de8ce3f3c
In place all-reduce and forgiving init ( #1178 )
2024-06-03 16:47:47 -07:00
Alex Barron
4d485fca24
Add defines include ( #1176 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-03 09:50:10 -07:00
Brian Keene
1865299a30
Metal shaders for memory efficient self attention on large sequences ( #964 )
...
* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00
Awni Hannun
fd1c08137b
stable cumprod grad at 0 ( #1167 )
2024-05-31 12:28:42 -07:00
Jagrit Digani
76b6cece46
Fix multi-block sort stride management ( #1169 )
...
* Fix multi-block sort stride management
* Add seed to tests
2024-05-31 11:10:54 -07:00
Jagrit Digani
9f0df51f8d
Fix matvec vector stride bug ( #1168 )
2024-05-29 12:18:28 -07:00
Awni Hannun
e7a2a3dcd1
Fix a couple bugs ( #1161 )
...
* fix jit reduce for RMS norm
* make strides a single buffer
* better eval error message
* fix compiling with inf and bf16
* fix cpu compile with bf16
2024-05-28 15:18:18 -07:00
Awni Hannun
a87ef5bfc1
fix broadcast bug in bitwise ops ( #1157 )
2024-05-24 11:44:40 -07:00
Awni Hannun
7e26fd8032
Option to JIT steel gemm / conv ( #1139 )
2024-05-23 18:07:34 -07:00
Jagrit Digani
eab2685c67
Float mask update ( #1152 )
...
* Float mask update
* Update CPU impl
2024-05-23 17:20:44 -07:00
Angelos Katharopoulos
50dfb664db
Comms ( #1097 )
...
* Start the communications branch using MPI
* Add ops and primitives
* Add python bindings for distributed
2024-05-23 17:04:02 -07:00
Awni Hannun
0189ab6ab6
More jitting ( #1132 )
...
* docs + circle min size build
* jit scan, arange, softmax
* add sort
* jit reductions
* remove print
* fix deps
* clean includes / nits
2024-05-23 16:23:44 -07:00
Rifur13
9401507336
Add groups to 2-D convolutions ( #1129 )
...
* Added groups to 2-D convolutions. Only implemented for **some** specializations.
Also fixed 1D grouped convs with different kernel strides and added more tests.
* fix channels condition
2024-05-22 20:01:44 -07:00
Abe Leininger
79ef49b2c2
add mx.trace ( #1143 ) ( #1147 )
...
* working c++ trace implementation
* updated throw + added overloads
* added python binding for trace function
* pre-commit reformatting
* add trace to docs
* resolve comments
* remove to_stream call
2024-05-22 15:50:27 -07:00
Awni Hannun
e110ca11e2
Fix offset bug for device buffers ( #1151 )
...
* fix bug with large offsets for buffers
* add a test
* remove test as its too big for small machine
2024-05-22 15:50:05 -07:00
Awni Hannun
226748b3e7
JIT compile option for binary minimization ( #1091 )
...
* try cpp 20 for compile
* unary, binary, ternary in jit
* nits
* fix gather/scatter
* fix rebase
* reorg compile
* add ternary to compile
* jit copy
* jit compile flag
* fix build
* use linked function for ternary
* some nits
* docs + circle min size build
* docs + circle min size build
* fix extension
* fix no cpu build
* improve includes
2024-05-22 12:57:13 -07:00
Awni Hannun
d568c7ee36
Rename block sparse ( #1149 )
...
* block_sparse_mm to gather_mm
* rename
* nit
* nit
2024-05-22 07:48:34 -07:00
Angelos Katharopoulos
da83f899bb
Improve qvm speed ( #1140 )
2024-05-20 09:20:44 -07:00
Awni Hannun
fb71a82ada
Fix copy bug with many dims ( #1137 )
2024-05-17 21:10:03 -07:00
Awni Hannun
23406c9e9e
Choose the right MLX bf16 for extensions ( #1135 )
...
* default to custom bf
* choose right bf
* fix extensions
* fix circle conf
2024-05-17 15:09:28 -07:00
Luca Arnaboldi
b3ec792380
Implemented Cholesky on CPU ( #1119 )
2024-05-17 12:31:59 -07:00
Angelos Katharopoulos
e78a6518fa
Block sparse qmm ( #1124 )
2024-05-16 15:24:14 -07:00
Awni Hannun
1873ffda01
Detect metal version and propagate correctly for JIT ( #1109 )
...
* detect metal version and propagate correctly for JIT
* remove softmax
* fix versions
2024-05-15 17:42:09 -07:00
Jagrit Digani
358e1fd6ab
Fused GEMM ( #1123 )
...
* Basic gemm working
* Update addmm
* Clear out steel_gemm and steel_addmm kernels
* Fuse and clear out gather gemm
* Update objc releases
2024-05-15 10:30:41 -07:00
Awni Hannun
863039da4c
Allow scatter type exception to be caught by checking in op ( #1077 )
...
* allow exception to be caught in main thread
* only for gpu
* more detailed scatter error
2024-05-13 17:43:53 -07:00
Awni Hannun
7178ac0111
No CPU option for binary minimization ( #1105 )
...
* no cpu build option
* docs
* fix
2024-05-13 16:08:11 -07:00
Max-Heinrich Laves
ff4223904d
Conv3d ( #993 )
...
* added conv3d
added conv3d
implemented explicit_gemm_conv_ND_cpu and bounds checks for slow_conv_3D
* incorporated reviewer comments
* fixed test
* reduced tensor shapes in test for conv3d
* Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
2024-05-11 06:15:02 -07:00
Awni Hannun
a9f80d60f6
improve error messaging in eval ( #1101 )
2024-05-10 10:04:07 -07:00
Alex Barron
2e158cf6d0
Add conjugate operator ( #1100 )
...
* cpu and gpu impl
* add mx.conj and array.conj()
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-05-10 07:22:20 -07:00
Awni Hannun
8b1906abd0
Add compiler flags to disable safetensors and gguf ( #1098 )
...
* with docs
* nit
2024-05-09 17:39:44 -07:00
Awni Hannun
06375e6605
Split encoders in non-concurrent context with a max ops per encoder ( #1085 )
...
* split encoders
* fix race
2024-05-09 16:21:02 -07:00
Rahul Yedida
cc05a281c4
Added ArcTan2 operation ( #1079 )
...
* Added ArcTan2 operation
* Cleanup, bug fixes from code review
* Minor cleanup, fixed Linux tests
2024-05-08 08:35:15 -07:00
Jagrit Digani
fe96ceee66
Update block offset adjustment to be in size_t ( #1087 )
2024-05-08 08:10:23 -07:00
Awni Hannun
21623156a3
Reset peak memory ( #1074 )
...
* reset peak memory
* fix linux
* nits in docs
2024-05-03 17:12:51 -07:00
Awni Hannun
b00ac960b4
change initial memory limits and add memory size to device info ( #1064 )
2024-05-03 06:50:15 -07:00
Jagrit Digani
f390957685
Block sparse mm ( #1058 )
2024-05-02 14:03:58 -07:00
Angelos Katharopoulos
17f57df797
Improvements in the quantizer and dequantization kernel ( #1061 )
2024-05-01 18:19:11 -07:00
Awni Hannun
7f7b9662ea
Fix leak for multi-output primitives which are never detached ( #1059 )
...
* fix multi output leak
* ignore arrays that will be detached
* add some comments
* stray print
2024-05-01 07:31:45 -07:00
Awni Hannun
19bef39f5c
Add a mx.metal.device_info
( #1060 )
...
* device inof
* add variant
* fix linux
* fix doc
2024-04-30 15:47:27 -07:00
Nripesh Niketan
a30e7ed2da
feat: metal formatting and pre-commit bump ( #1038 )
...
* feat: metal formatting and pre-commit bump
* add guards
* update
* more guards
* more guards
* smakk fix
* Refactor instantiation of ternary types in ternary.metal
* fix scan.metal
2024-04-30 07:18:09 -07:00
Angelos Katharopoulos
8db7161c94
Bug fix in quantize ( #1054 )
2024-04-29 20:55:04 -07:00
Awni Hannun
09f1777896
fix slice update indexing ( #1053 )
2024-04-29 12:17:40 -07:00