nicolov
eaba55c9bf
Add matrix inversion primitive ( #822 )
2024-03-15 06:34:36 -07:00
Awni Hannun
19ec023256
vmap matmul and admm ( #836 )
2024-03-14 14:38:22 -07:00
Awni Hannun
63ab0ab580
version ( #835 )
2024-03-14 12:20:40 -07:00
Jagrit Digani
8dfc376c00
Strided reduce specialization for small reductions ( #826 )
...
* Add small column / general reduction specialization
2024-03-14 09:16:53 -07:00
Angelos Katharopoulos
1efee9db09
Add types and order in kernel name ( #831 )
2024-03-13 20:34:06 -07:00
Awni Hannun
43abc402d8
route to fallback ( #828 )
2024-03-13 19:56:04 -07:00
Angelos Katharopoulos
3f8b1668c4
Make reshape faster for row_contiguous cases ( #829 )
2024-03-13 16:22:03 -07:00
Angelos Katharopoulos
76c919b4ec
NumberOfElements for shapeless compile and vmap fixes ( #802 )
2024-03-13 10:34:14 -07:00
Angelos Katharopoulos
29d0c10ee5
Reshape improvement ( #818 )
2024-03-12 17:54:31 -07:00
Jagrit Digani
5ad133f8bb
No copy gems ( #801 )
...
* Enable collapsing batch dims in gemm
* Update gemm to only make copies when neither of the last 2 axes are contiguous
* Update addmm to support gemv shapes
* Update addmm to support irregular batch strides
* Update tests
2024-03-12 13:13:41 -07:00
nicolov
d0c544a868
Add SVD primitive ( #809 )
...
Add SVD op using Accelerate's LAPACK following
https://developer.apple.com/documentation/accelerate/
compressing_an_image_using_linear_algebra
Co-authored-by: Nicolo Valigi <nvaligi@apple.com>
2024-03-12 12:30:11 -07:00
Daniel Falbel
ffb19df3c0
Fix docstring for correctly rendering ( #820 )
2024-03-12 11:46:44 -07:00
Awni Hannun
8b7532b9ab
fix scatter ( #821 )
2024-03-12 11:42:07 -07:00
Awni Hannun
366478c560
fix modules with dict ( #819 )
2024-03-12 08:54:06 -07:00
Justin Deschenaux
8e5600022a
Implement RNN, GRU, LSTM ( #268 )
...
* RNN base implementation
* Address comments+format
* nits in docs
* add tests for prb
* fix test
* add a couple tests
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-11 21:14:44 -07:00
Awni Hannun
0e95b64942
Fix bug in tape order during simplify ( #816 )
...
* fix bug in tape order during simplify
* properly fix compile
* last bug
2024-03-11 17:29:05 -07:00
nicolov
0ae22b915b
Remove code duplication in reduce ops ( #793 )
...
* Remove code duplication in reduce ops
* Remove the unnecessary lambda
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-03-11 10:57:07 -07:00
Awni Hannun
7c441600fe
Compile stride bug ( #812 )
...
* fix compile stride bug
* revert sdpa fix
* fix cpu
* fix bug with simplifying outputs
2024-03-11 06:31:31 -07:00
Awni Hannun
a4d290adb9
Remove depth traversal ( #813 )
...
* no depth traversal
* counter outside loop
2024-03-09 20:21:32 -08:00
Awni Hannun
28301807c2
Version bump and os error ( #807 )
2024-03-07 13:57:58 -08:00
Awni Hannun
74ed0974b3
Support 13.0+ with xcode 14.3 ( #806 )
...
* Support 13.0+ with xcode 14.3
* revert revert
2024-03-07 13:27:57 -08:00
Jagrit Digani
ec8a4864fa
Fix SDPA kernel bug on Mac OS 13.3 SDK ( #805 )
...
* Move sdpa kernel to allocate tgp mem statically and allow macOS 13.3 SDK builds
* Style
2024-03-07 10:18:09 -08:00
Awni Hannun
b7588fd5d7
fix inplace to not make a shallow copy ( #804 )
2024-03-07 09:34:11 -08:00
Awni Hannun
f512b905c7
Minimum xcode / sdk ( #800 )
...
* minimum xcode /sdk
* try multiple xcode versions in CI
* update python
* metal validation for python tests
2024-03-07 08:19:43 -08:00
Awni Hannun
afd5274049
route to fallback for bfloat ( #794 )
2024-03-06 15:39:12 -08:00
Awni Hannun
1074674e32
Add a maximum graph depth ( #797 )
...
* add a maximum graph depth
* remember how to use C++
2024-03-06 15:39:00 -08:00
AlexCheema
7762e07fde
Update function_transforms.rst ( #796 )
...
Fix typo in function_transforms.rst
2024-03-06 12:03:37 -08:00
Luca Arnaboldi
cbefd9129e
Implementation of pickle, copy and deepcopy for Python arrays ( #300 & #367 ). ( #713 )
...
* Implemented pickling and copy for Python arrays(#300 & #367 )
* Fixing typos
* Pickle with NumPy arrays
* Pickle: workaround for bfloat16
* Revert "Pickle: workaround for bfloat16"
This reverts commit 25afe6bc09
.
* Added an error when pickling bfloat16
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/tests/test_array.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/array.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* clang-format applied
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-03-06 08:02:41 -08:00
Angelos Katharopoulos
e39bebe13e
Fix reshaping of empty arrays ( #791 )
2024-03-05 23:33:22 -08:00
Angelos Katharopoulos
14b4e51a7c
Improved quantized matrix vector product ( #786 )
2024-03-05 17:32:19 -08:00
Awni Hannun
cbcf44a4ca
Some fixes in cache / thread safety ( #777 )
...
* some fixes in cache / thread safety
* speed up no cache case
* fix opt test
* optimizer docs
* otpimizer docs
* fix adafactor
* fix adafactor
2024-03-05 13:30:50 -08:00
Awni Hannun
859ae15a54
Fix test ( #785 )
2024-03-04 23:02:27 -08:00
Brian Keene
0787724c44
Fast Inference SDPA op ( #735 )
...
* Fast Inference SDPA op
Implements metal shaders for:
o = mx.fast_inference_sdpa(queries, keys, values, scale, mask)
Supports fp16, fp32 dtypes; assumes d_k = 128.
Generic op support / prompt encoding supported via mlx primitives.
Metal implementation is for the inference use case only.
Majority of performance benefits appears to results from GQA & reduced
bandwidth requirements; there is approximate performance parity for the
MHA use case (from some measurements on M3 Max).
* Flush shared memory to zero before unprotected reads for (scores @ values)
* Move to fast:: namespace, address reviewer comments
... also attempt to revert formatter auto-change for files not relevant
to this change
* Shared memory flush to top of kernel
* Resolve compiler warnings
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/fast.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update docstring per PR feedback
* Softmax in higher precision, ...
* route to fallback for more use cases - batch size > 1, head_dim other
than 128, etc.
* Address linux build failure
* Address other reviewer comments
* Remove extraneous eval_cpu function per review
---------
Co-authored-by: Atila Orhon <64497909+atiorh@users.noreply.github.com>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: atila <atiorh@icloud.com>
2024-03-04 21:06:11 -08:00
Awni Hannun
7b463ffb07
Ios compile ( #784 )
...
* try to fix build for ios
* skip cpu compile
* fix namespace
* fix namespace
* Use CMake for platform specific cpu compile
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-03-04 20:02:26 -08:00
Jagrit Digani
6686e61ca4
Reduce update ( #783 )
...
* Split reduction files to reduce compile times
* Add small and medium axis size specializations for row reductions
* Add non-row-reduction options for small and med kernels
2024-03-04 19:09:51 -08:00
Awni Hannun
c096a77b9b
revision bump ( #778 )
2024-03-04 13:41:53 -08:00
Awni Hannun
5121f028d9
nice tensordot for mlx c ( #782 )
2024-03-04 09:51:02 -08:00
Piotr Rybiec
6a665ea6ed
Dilation for convolutional layers ( #766 )
...
* add dilation parameter to Conv1d layer
* space here too
* add conv1d dilation test
* add dilation parameter for Conv2d layer
* conv2d dilation test
2024-03-04 06:43:00 -08:00
Awni Hannun
bc06cb9ff6
Pickle + dtype fix for numpy conversion ( #763 )
...
* pickle + dtype fix for numpy conversion
* fix getattribute on Module base
* remove unused function
* fix tests
* add topk to ops
* fix doc
2024-03-02 06:09:29 -08:00
Angelos Katharopoulos
8e281c76c3
Fix the top-k op ( #768 )
2024-03-01 22:08:43 -08:00
Awni Hannun
d5964a2710
bindings for memory info ( #761 )
...
* bindings for memory info
* update api
* keep cache low if requested
* fix default
* nit in ops error
2024-03-01 19:51:58 -08:00
Ikko Eltociear Ashimine
cf3eb87e52
Fix typo in transforms.cpp ( #764 )
...
occuring -> occurring
2024-02-29 22:23:46 -08:00
Awni Hannun
ab3a466711
bump ( #760 )
2024-02-29 11:58:54 -08:00
Awni Hannun
4494970f47
avoid nested closures in module ( #759 )
2024-02-29 09:39:52 -08:00
Jagrit Digani
776c3d226d
Convolution update ( #651 )
...
* Init steel conv and update Conv primitive
* Update slow CPU implementation to support flipping and input dilation winograd conv routing
Co-authored-by: Awni Hannun <awni@apple.com>
2024-02-28 20:11:16 -08:00
Awni Hannun
f5f18b704f
fix temporary bug ( #752 )
2024-02-27 17:44:39 -08:00
Awni Hannun
420ff2f331
Add back compiled function signatures and docstrings ( #749 )
...
* try to add back compiled function signatures and docstrings
* add indentation to docstring
2024-02-27 13:18:59 -08:00
Awni Hannun
56ba3ec40e
fix cpu compile on older OS ( #747 )
2024-02-26 22:20:53 -08:00
Noah Kasmanoff
de3d2467a3
Update: Fast GeLU Approximation ( #744 )
...
* add: fast gelu approx
* fix docs
* Update gelu_fast_approx function documentation
* Update python/mlx/nn/layers/activations.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* fix: test gelu
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-02-26 21:08:50 -08:00
Awni Hannun
fe1dabf272
Fix compile with non standard types ( #745 )
...
* refactor tree utils
* fix compile + tree code refactor
* Add an extra test
* add a few missing activations to docs
* hash structure
* Encode the full argument structure
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-02-26 19:28:53 -08:00