Awni Hannun
fcb65a3897
Even Faster I/O ( #1369 )
...
* try multithreading for faster IO
* smaller batch size
* Account for pread returning less than size
* nit
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-28 11:49:07 -07:00
Saanidhya
4e22a1dffe
In continuation to PR1243 to solve issue #1240 ( #1365 )
...
* Solves issue #1240
* Correction
* Update python/mlx/utils.py
* Update python/mlx/utils.py
---------
Co-authored-by: Awni Hannun <awni@apple.com>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-08-28 11:40:41 -07:00
Awni Hannun
291cf40aca
Some fixes to typing ( #1371 )
...
* some fixes to typing
* fix module reference
* comment
2024-08-28 11:16:19 -07:00
Jeethu Rao
bd47e1f066
Fix neon_fast_exp and add more softmax tests ( #1367 )
2024-08-27 23:42:42 -07:00
Aditya Dhulipala
e6b223df5f
Pinv ( #875 )
2024-08-27 23:06:12 -07:00
Angelos Katharopoulos
e64349bbdd
Make eval just wait if all arrays are scheduled ( #1368 )
2024-08-27 17:01:22 -07:00
Angelos Katharopoulos
cdb59faea6
Adds send/recv ops in distributed ( #1366 )
2024-08-26 23:01:37 -07:00
Alex Barron
1d94ac3f90
Add optional headers to `mx.fast.metal_kernel
` ( #1358 )
2024-08-26 21:45:45 -07:00
Awni Hannun
5f7d19d1f5
MPI ops in GPU stream for faster comms ( #1356 )
2024-08-26 15:12:50 -07:00
Awni Hannun
2fdf9eb535
Fix ternary for large arrays ( #1359 )
...
* fix ternary for large arrays
* fix
2024-08-26 11:22:27 -07:00
Awni Hannun
860d3a50d7
fix extension metal library finding ( #1361 )
2024-08-26 09:18:50 -07:00
Alex Barron
d1183821a7
int() and float() for mx.array ( #1360 )
2024-08-25 20:41:44 -07:00
Angelos Katharopoulos
8081df79be
Fix boolean all reduce bug ( #1355 )
2024-08-24 10:09:32 -07:00
Nripesh Niketan
64bec4fad7
Chore: update pre-commit hooks ( #1353 )
...
* Chore: update pre-commit refs
* run pre-commit
2024-08-24 06:46:36 -07:00
Alex Barron
b96e105244
Add grid_sample
example to metal_kernel
docs ( #1352 )
...
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`
* add grid sample to docs
* zero_outputs -> init_value
* add missing header for linux
2024-08-23 18:24:16 -07:00
Awni Hannun
3b4d5484c7
Bump extension MLX version ( #1350 )
...
* Bump extension MLX version
* fix some docs nits
2024-08-23 12:38:34 -07:00
Alex Barron
684e11c664
patch ( #1347 )
2024-08-23 10:42:02 -07:00
Angelos Katharopoulos
b57a52813b
Further reduction tuning ( #1349 )
...
* More reduction tuning
* Forgotten pdb
* Small column long row specialization
2024-08-23 10:35:25 -07:00
Alex Barron
da8deb2b62
fix bug with multiple attributes ( #1348 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-08-23 10:06:15 -07:00
Awni Hannun
98b6ce3460
Refactor reductions and fix scatter atomics for large sizes ( #1300 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-08-22 16:03:31 -07:00
Awni Hannun
f9e00efe31
fix nanobind and stub gen in circle ( #1346 )
2024-08-22 14:07:27 -07:00
Alex Barron
0fd2a1f4b0
Custom Metal Kernels from Python ( #1325 )
...
* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only
2024-08-22 13:46:29 -07:00
Awni Hannun
df3233454d
2d gather specialization ( #1339 )
2024-08-22 10:48:24 -07:00
Awni Hannun
82db84b899
bump nanobind + fix extension ( #1344 )
2024-08-21 16:05:07 -07:00
Awni Hannun
8ae751d3da
fix io ( #1343 )
...
* fix io
* fix io
* comment
2024-08-21 13:14:46 -07:00
Awni Hannun
d40e76809f
Fix rope ( #1340 )
...
* add test
* fix rope
* fix test
2024-08-20 17:37:52 -07:00
Awni Hannun
bb1b76d9dc
RoPE with frequencies as optional input ( #1337 )
...
* start rope with freq input
* rope with frequencies
* nits
* fix bug
* fix bug + test
* cleanup
* optional base
2024-08-19 18:30:50 -07:00
Angelos Katharopoulos
9d26441224
Fix contiguity check ( #1336 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-08-19 16:05:06 -07:00
Awni Hannun
f12f24a77c
fix compiling with space in paths ( #1332 )
2024-08-15 16:39:24 -07:00
Awni Hannun
ae5b5cabfd
Fix optimizer reloading from checkpoint ( #1329 )
...
* fix optimizer reloading from checkpoint
* comment
2024-08-15 07:33:23 -07:00
Awni Hannun
d0630ffe8c
Read arrays from files faster ( #1330 )
...
* read faster
* faster write as well
* set default permission for linux
* comment
2024-08-14 20:09:56 -07:00
Alex Barron
99bb7d3a58
GPU mx.sign for complex64 ( #1326 )
2024-08-14 07:54:53 -07:00
Awni Hannun
63ae767232
fix transformer ( #1327 )
2024-08-13 16:04:26 -07:00
Awni Hannun
eaaea02010
Add isfinite
( #1318 )
...
* isfinite
* remove reduce test since fix is not complete
2024-08-13 14:49:28 -07:00
Bhargav Yagnik
a098bc92e0
Fix: Preserve input dtype in Dropout layer output ( #1323 )
...
* Fix: Preserve input dtype in Dropout layer output
- Modified Dropout implementation to ensure that the output dtype matches the input dtype.
- This resolves the issue #1321
* Update test cases in test_nn.py
- Revised test cases to align with updated dropout code
- Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout,
* updated dropout.py
2024-08-13 11:54:21 -07:00
Awni Hannun
1086dc4db0
patch ( #1320 )
2024-08-12 16:13:33 -07:00
Brian Keene
19fb69e2ed
Add memory_efficient_threshold kwarg to sdpa kernel ( #1319 )
...
Allows opt-in to memory efficient GPU shader at proscribed sequence
length. Otherwise, utilizes aggregate MLX primitives for best latency.
2024-08-12 12:57:09 -07:00
Awni Hannun
9231617eb3
Move to nanobind v2 ( #1316 )
2024-08-08 17:17:46 -07:00
Alex Barron
32668a7317
CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv ( #1307 )
...
* add cholesky inv + tri inv
* always run tri_inv on cpu
* consistent naming
2024-08-08 15:18:02 -07:00
Angelos Katharopoulos
780c197f95
Fix test tolerance and patch bump ( #1315 )
2024-08-08 14:51:09 -07:00
Angelos Katharopoulos
eb8819e91e
Revert variance to be numerically stable ( #1314 )
2024-08-08 13:35:02 -07:00
Awni Hannun
30bbea2f08
Add gemv masked to JIT plus some fixes ( #1310 )
...
* add gemv masked to JIT plus some fixes
* some cleanup
* add utils
* fix
* fix 2
* more cleaning
* fix
* remove unused mps matmul support
* one more nit
* revert
2024-08-07 13:38:07 -07:00
Alex Barron
635ccd9e25
Add "edge" mode to mx.pad ( #1309 )
...
* Add edge padding mode
* fix pad in pooling
* string arg instead of enum
2024-08-06 11:23:10 -07:00
nicolov
8c9f0278b9
Add vmap to scatter ( #1200 )
...
* Add vmap to scatter
* updates
* vmap updates + a few more tests
* bug fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-08-05 20:12:27 -07:00
Awni Hannun
58d0e199e1
add bfloat conv for windograd ( #1306 )
...
* add bfloat conv for windograd
* accumulate in fp32
* accumulate in fp32
* accumulate in bf16
2024-08-05 15:51:13 -07:00
Awni Hannun
10b5835501
fix creating array from bf16 tensors in jax / torch ( #1305 )
2024-08-01 16:20:51 -07:00
Awni Hannun
6c8dd307eb
faster group norm ( #1304 )
2024-08-01 12:49:23 -07:00
Awni Hannun
43ffdab172
fix rope and random ( #1301 )
...
* fix rope and random
* comment
2024-07-31 16:18:25 -07:00
Awni Hannun
40b6d67333
Fixes for large arrays with a few ops ( #1299 )
...
* fixes for large arrays with a few ops
* fix bug
* fix all of copy
2024-07-30 17:18:39 -07:00
Alex Barron
c52d1600f0
Fused Affine Quantize/Dequantize ops ( #1282 )
...
* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize
2024-07-29 15:11:38 -07:00