Awni Hannun
b21242faf1
Allow unary ops to accept array like ( #1093 )
2024-05-09 09:36:02 -07:00
Rahul Yedida
cc05a281c4
Added ArcTan2 operation ( #1079 )
...
* Added ArcTan2 operation
* Cleanup, bug fixes from code review
* Minor cleanup, fixed Linux tests
2024-05-08 08:35:15 -07:00
Awni Hannun
9814a2ae12
fix conversion to array ( #1070 )
2024-05-06 16:02:49 -07:00
Shubham
6992498e7a
add keyword positonal ( #1081 )
2024-05-06 07:18:49 -07:00
Awni Hannun
21623156a3
Reset peak memory ( #1074 )
...
* reset peak memory
* fix linux
* nits in docs
2024-05-03 17:12:51 -07:00
Nripesh Niketan
79c859e2e0
feat: implement clip_grad_norm
( #1043 )
...
* feat: implement `clip_grad_norm`
* pre-commit
* Add test for clip_grad_norm function in test_optimizers.py
* small fixes
* fix
* lint
* Update tree_reduce
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/mlx/utils.py
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Refactor clip_grad_norm function to include documentation and improve readability
* format docstring
* Add acknowlegements
* text wrap
* pre-commit
* nits in docs
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-05-03 09:07:02 -07:00
Awni Hannun
b00ac960b4
change initial memory limits and add memory size to device info ( #1064 )
2024-05-03 06:50:15 -07:00
Jagrit Digani
f390957685
Block sparse mm ( #1058 )
2024-05-02 14:03:58 -07:00
Angelos Katharopoulos
17f57df797
Improvements in the quantizer and dequantization kernel ( #1061 )
2024-05-01 18:19:11 -07:00
Awni Hannun
7f7b9662ea
Fix leak for multi-output primitives which are never detached ( #1059 )
...
* fix multi output leak
* ignore arrays that will be detached
* add some comments
* stray print
2024-05-01 07:31:45 -07:00
Awni Hannun
19bef39f5c
Add a mx.metal.device_info
( #1060 )
...
* device inof
* add variant
* fix linux
* fix doc
2024-04-30 15:47:27 -07:00
Angelos Katharopoulos
8db7161c94
Bug fix in quantize ( #1054 )
2024-04-29 20:55:04 -07:00
Awni Hannun
09f1777896
fix slice update indexing ( #1053 )
2024-04-29 12:17:40 -07:00
Jacket
490c0c4fdc
[Fix] expand axes for dimension with integer indices in mlx_slice_update ( #1035 )
...
* Not sure if this is correct
* Format
* Edit tests
* Add negative test
* Format
* add one more test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-29 07:57:28 -07:00
Rifur13
c4a471c99d
Add groups to Conv1d ( #948 )
...
* Add conv1d grouped convs on CPU
* Add GPU support
* Parallelize inside metal kernel
* clenaup
* Update mlx/ops.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* New unfold kernel + remove unused code
* Remove copy and refactor
* Update vjp and reuse steel gemm
* Fixed groups on cpu
* Fix metal validation
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-04-27 06:24:57 -07:00
Awni Hannun
86f495985b
Add bitwise ops ( #1037 )
...
* bitwise ops
* fix tests
2024-04-26 22:03:42 -07:00
Awni Hannun
5bfe89bdb1
Cpp docs ( #1036 )
...
* start of C++ docs
* fix stream doc
* only include ops for now
2024-04-26 12:56:05 -07:00
Awni Hannun
771575d27b
Expose function to clear memory cache ( #1032 )
...
* expose function to clear memory cache
* fix linux build
* fix metal tests
2024-04-24 16:48:51 -07:00
Angelos Katharopoulos
ec8578d41a
Fix quantization of all 0s ( #1028 )
2024-04-24 00:40:42 -07:00
Aneesh Shetty
d0dbfe0b97
Adds radians and degrees ( #1011 )
2024-04-22 11:17:49 -07:00
Awni Hannun
3d405fb3b1
Add synchronize function ( #1006 )
...
* add synchronize function
* fix linux
* fix linux
* fix and fix docs
* fix test
* try synchronize in stream destroy
* synchronize works for both cpu and gpu
2024-04-22 08:25:46 -07:00
Angelos Katharopoulos
84d61d27aa
Make sure 0 is represented in the quantization ( #1016 )
2024-04-19 19:47:26 -07:00
Angelos Katharopoulos
ef5f7d1aea
Fix buffer protocol buffer size designation ( #1010 )
2024-04-19 06:06:13 -07:00
Jagrit Digani
85c8a91a27
Fix mask broadcasting bug and add relevant test ( #1003 )
2024-04-17 17:33:48 -07:00
Piotr Rybiec
581b699ac9
avgpool, not maxpool ( #1002 )
2024-04-17 08:26:22 -07:00
Awni Hannun
8a0677d56d
Shared events for synchronization + async eval ( #998 )
...
* more async eval
* fix rebase
* try correct async eval
* fix async
* more tests for async eval
* use shared events for synchronization
* comment + cleanup
* with autorelease pool
* fix no metal build
* fix compile
* fix patch
* don't eval if asyn evale'd
* don't use is_evaled
* comments
* more multi stream tests
* try and cleanup use of is_evaled
* use a status flag
2024-04-17 06:16:02 -07:00
Jagrit Digani
b18468bf81
Masked mm ( #978 )
...
* Add block masked matmul op and primitive
2024-04-16 14:45:39 -07:00
Shiyu
107ba2891a
gelu tanh approx ( #989 )
...
* gelu tanh approx
* gelu tanh approx
* replace gelu approx with tanh approach
* fix comments
* fix comment
2024-04-15 19:49:00 -07:00
Awni Hannun
cd9e184529
Quantize embedding ( #994 )
...
* quantize embedding
* rename as_linear + comment
* consistency in docs
* fix test
2024-04-15 16:42:10 -07:00
Alex Barron
2e7c02d5cd
Metal FFT for powers of 2 up to 2048 ( #915 )
...
* add Metal FFT for powers of 2
* skip GPU test on linux
* fix contiguity bug
* address comments
* Update mlx/backend/metal/fft.cpp
* Update mlx/backend/metal/fft.cpp
* fix bug in synch
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-11 21:40:06 -07:00
Awni Hannun
ae18326533
No copy command encoder ( #986 )
...
* no copy command encoder
* up layer norm test tolerances
2024-04-11 21:15:36 -07:00
Awni Hannun
12d4507ee3
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
Shiyu
061cf9a4ce
Upsample with bicubic interpolation ( #967 )
2024-04-10 15:47:22 -07:00
Awni Hannun
99abb9eff4
Async eval ( #972 )
2024-04-09 18:34:00 -07:00
Luca Arnaboldi
fffe072028
Implementation of mlx.random.multivariate_normal ( #502 ) ( #877 )
...
* Implementation of mlx.random.multivariate_normal (#502 )
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Updated typo in docstring
* Restricted multivariate_normal to float32
* Generic mean and variance shapes
* Review edits
* Update mlx/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Update python/src/random.cpp
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
* Test for ndim of mean and cov
* nits
* smaller size for test
* fix broadcasted sampling
---------
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-09 13:50:12 -07:00
Abe Leininger
a1a31eed27
Add mx.meshgrid ( #961 )
2024-04-09 11:43:08 -07:00
Awni Hannun
42afe27e12
std and expm1 ( #973 )
...
* std and expm1
* actually add expm1
* fix linux
* fix vjp
* relax tol for linux test
* Add it to the compilable primitives
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-04-08 14:26:01 -07:00
Awni Hannun
76e63212ff
Enable bfloat scan ( #974 )
...
* enable bfloat scan
* fix tests
2024-04-08 12:29:19 -07:00
Awni Hannun
aac2f9fb61
Improve profiling with gpu tracing ( #969 )
...
* improve profiling with gpu tracing
* fix for linux
* nit
* doc fix
* fix example
2024-04-07 21:47:43 -07:00
Awni Hannun
039da779d1
No quant reshape ( #957 )
...
* precise option on cpu
* remove print
* remove reshape in quant matmul
* no quant reshape
2024-04-04 11:52:12 -07:00
Awni Hannun
d88d2124b5
segfaut layer norm grad ( #955 )
2024-04-04 10:59:15 -07:00
Awni Hannun
e142aaf8a1
Option for precise softmax ( #953 )
...
* precise softmax
* Add an equivalency check
* Make the threadgroup memory definition fixed
* precise cpu softmax
* precise option on cpu
* remove print
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-04-04 08:32:35 -07:00
AmirHossein_Razlighi
0caf35f4b8
Better exceptions in case of invalid operations on mlx.core.array
( #910 ) ( #926 )
...
* Nicer exceptions for ops on non-arrays
2024-04-02 21:11:24 -07:00
Angelos Katharopoulos
3fc993f82d
Properly handle negative axes in python vmap ( #944 )
2024-04-02 18:07:23 -07:00
Awni Hannun
741eb28443
fix a couple bugs ( #952 )
2024-04-02 12:07:41 -07:00
Angelos Katharopoulos
1a87dc5ea8
Fix compile fusion for multi-output edge cases ( #950 )
...
* Fix compile fusion for multi-output edge cases
* Add a test for multi-output compile
2024-04-02 08:42:31 -07:00
Awni Hannun
2427fa171e
Fix cpu compile ( #934 )
...
* fix one cpu bug, test for another
* format hooks
* simplify contiguity check for cpu compile
* fix
* add back donation
* comment
2024-04-01 17:37:12 -07:00
Jagrit Digani
639e06e1f3
Indexing bug fix ( #947 )
...
* Fix axes accounting
* Add tests
2024-04-01 12:18:50 -07:00
Angelos Katharopoulos
02fedbf1da
Fix array initialization from list ( #942 )
...
* Fix array initialization from list
* Change the error message in the test
2024-04-01 06:27:52 -07:00
Angelos Katharopoulos
110d9b149d
Layer norm grad fix donation bug ( #941 )
...
* add layer norm grad test
* Fix donation bug in layernorm vjp
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-04-01 06:15:50 -07:00