Awni Hannun
0eb56d5be0
Wired ( #1510 )
...
* expose residency sets as wire/unwire
* returns wired size
* fix
* runtime support check
* fix os check
* fix test
* fix no metal build
* docs
* nit
* nits in docs
* nits
2024-10-25 09:35:33 -07:00
Awni Hannun
dad1b00b13
fix ( #1523 )
2024-10-24 19:17:46 -07:00
Angelos Katharopoulos
c9b41d460f
Working 64-bit scans ( #1506 )
2024-10-24 11:05:46 -07:00
xnorai
32972a5924
C++20 compatibility for fmt ( #1519 )
...
* C++20 compatibility for fmt
* Address review feedback
* Remove stray string
* Add newlines back
2024-10-24 08:54:51 -07:00
Dhruv Govil
f6afb9c09b
Remove use of vector<const T> ( #1514 )
2024-10-22 16:31:52 -07:00
Kashif Rasul
3ddc07e936
Eigenvalues and eigenvectors ( #1334 )
...
* initial eigvalsh
* add compute_vectors
* add compute_vectors_
* return a pair
* add eigh to return only eigenvectors
* fixed typo
* merge merge Eighvalsh and Eigh into a single primitive
* use the same primate with the flag
* fix primatives
* use MULTI
* fix eval_gpu
* fix decleration
* rename EighPrimitive to Eigh
* tests
* tests
* fix rebase and format
* cleanup lapack
* format
* add cblas.h
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-10-22 12:18:48 -07:00
Awni Hannun
c26208f67d
Remove Hazard tracking with Fences ( #1509 )
...
* remove hazard tracking
* with fence map
* no hazard tracking with fences
* nits
* fix fence retain
* cleanup
* fix quantized rebase
2024-10-21 19:33:32 -07:00
Alex Barron
d15fa13daf
Batched Quantized Matmul + Fast Small QMV ( #1503 )
...
* add fast qmv for small dims
* fix test
* batched cpu
* add batched template param
* refactor metal quantized.cpp
2024-10-21 16:23:17 -07:00
Awni Hannun
92d7cb71f8
Fix compile ( #1501 )
...
* fix compile
* fix space
2024-10-18 11:06:40 -07:00
Angelos Katharopoulos
50d8bed468
Fused attention for single query ( #1497 )
2024-10-18 00:58:52 -07:00
Awni Hannun
3f86399922
Real and Imag ( #1490 )
...
* real and imag
* fix
* fix
2024-10-15 16:23:15 -07:00
Awni Hannun
881615b072
Faster metal compiled kernels + some fixes ( #1486 )
...
* bump mac tests to use py39
* work per thread for compiled kernels
* fixe for large arrays
* fix
2024-10-14 12:45:38 -07:00
Awni Hannun
bf6ec92216
Make the GPU device more thread safe ( #1478 )
...
* gpu stream safety
* comment
* fix
2024-10-12 17:49:15 -07:00
Awni Hannun
1fa0d20a30
consistently handle all -inf in softmax ( #1470 )
2024-10-08 09:54:02 -07:00
Awni Hannun
3274c6a087
Fix array is_available race cases ( #1468 )
2024-10-07 19:13:50 -07:00
Awni Hannun
95d04805b3
Fix complex power on Metal ( #1460 )
2024-10-06 19:58:30 -07:00
Awni Hannun
e4534dac17
Conv grad with groups + bugfix ( #1449 )
...
* fix bug in flipped conv with groups, start of grad for groups
* fix
* fix
* fix + test
2024-10-06 07:08:53 -07:00
Angelos Katharopoulos
d878015228
Fix normalization check_input ( #1452 )
2024-10-03 13:26:56 -07:00
Angelos Katharopoulos
bacced53d3
Fix row reduce with very few rows ( #1447 )
2024-09-29 20:00:35 -07:00
Awni Hannun
11354d5bff
Avoid io timeout for large arrays ( #1442 )
2024-09-27 13:32:14 -07:00
Awni Hannun
5b6f38df2b
Faster cpu ops ( #1434 )
...
* faster binary and cleaner copy
* use recursive template for other ops
* more cleanup
* fix from cleanup
* more clean
* fix binary
* use contiguous iterator
* add 3d
* nits
* fix
* fix?
* fix
* fix rebase
2024-09-26 09:19:13 -07:00
Awni Hannun
0b4a58699e
Some overhead reductions in mx.fast.metal_kernel ( #1437 )
...
* some overhead reductions
* fix
* use +=
* use more +=
2024-09-25 17:25:21 -07:00
Awni Hannun
4f9f9ebb6f
Faster Metal unary and binary for general case ( #1431 )
...
* faster unary and binary for general case
* update ternary + jit fix
* fix jit
* unary work per thread
2024-09-25 12:07:43 -07:00
Awni Hannun
67b6bf530d
Optimization for general ND copies ( #1421 )
2024-09-17 17:59:51 -07:00
Awni Hannun
4f46e9c997
More fixes for arrays with large sizes ( #1405 )
...
* compile works for big arrays when contiguous
* style
* nits in docs
* a bunch more stuff
* update jit
* update jit
* use constant for shapes and strides and remove elem_to_loc overload
* use kernel instantiation
* docs nits
* update binary and ternary
* comments
2024-09-17 12:46:31 -07:00
Nripesh Niketan
669c27140d
Chore: add pre-commit hook for cmake ( #1362 )
...
* reset and lint
* format
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-16 12:53:01 -07:00
Awni Hannun
b3f52c9fbe
ensure io/comm streams are active before eval ( #1412 )
2024-09-14 06:17:36 -07:00
Angelos Katharopoulos
881f09b2e2
Allow querying the allocator for the buffer size ( #1404 )
2024-09-11 21:02:16 -07:00
Awni Hannun
02efb310ca
Xcode 160 ( #1384 )
...
* xcode 16.0 with debug tests
* limit nproc for builds
* vmap bug
* assert bug
* run python tests in debug mode
* fix view, bool copies preserve bits'
* actual view fix
2024-09-10 15:15:17 -07:00
Awni Hannun
e7e59c6f05
Fix copying scalars by adding fill_gpu ( #1402 )
...
* fix copying scalars by adding fill_gpu
* Another copy scalar changed to fill
---------
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-09-09 15:54:08 -07:00
Max-Heinrich Laves
efeb9c0f02
Transposed Convolution ( #1245 )
...
* initial implementation for conv_transpose
ran pre-commit
implemented conv_transpose
updated conv_general docstring
updated conv_general docstring
updated code comments
removed commented run_conv_checks
updated acknowledgments
added missing entry to ops.rst
added op to nn.layers
resolved merge conflicts
* removed ConvolutionTranspose primitive as suggested by reviewer
removed ConvolutionTranspose primitive as suggested by reviewer
* remove transpose flag, add another test
---------
Co-authored-by: Awni Hannun <awni@apple.com >
2024-09-06 19:52:38 -07:00
Awni Hannun
7cca1727af
Fix slice data size ( #1394 )
...
* fix slice data size and add tests
* fix contiguous flag
* simplify stride and perform copy for non-contiguous arrays
* fix cpu
* comment
2024-09-04 19:10:43 -07:00
Awni Hannun
41c603d48a
fix jit reduce ( #1395 )
2024-09-04 14:03:10 -07:00
Angelos Katharopoulos
58dca7d846
Fix copy in the sort primitive ( #1383 )
2024-08-31 08:32:14 -07:00
Alex Barron
da691257ec
Fix overflow in quantize/dequantize ( #1379 )
...
* add 2d indices to prevent overflow
* use nthreads not out size
2024-08-30 13:32:41 -07:00
Awni Hannun
dba2bd1105
Even Even Faster IO ( #1374 )
...
* even more faster io
* make reader pool static
* make python reader thread safe
* one more optimization
2024-08-29 16:05:40 -07:00
Alex Barron
28be4de7c2
Fix JIT reductions ( #1373 )
2024-08-28 16:39:11 -07:00
Awni Hannun
a6c3b38fba
Async load ( #1372 )
...
* async load
* async load
2024-08-28 14:21:55 -07:00
Angelos Katharopoulos
cdb59faea6
Adds send/recv ops in distributed ( #1366 )
2024-08-26 23:01:37 -07:00
Awni Hannun
5f7d19d1f5
MPI ops in GPU stream for faster comms ( #1356 )
2024-08-26 15:12:50 -07:00
Awni Hannun
2fdf9eb535
Fix ternary for large arrays ( #1359 )
...
* fix ternary for large arrays
* fix
2024-08-26 11:22:27 -07:00
Awni Hannun
860d3a50d7
fix extension metal library finding ( #1361 )
2024-08-26 09:18:50 -07:00
Angelos Katharopoulos
8081df79be
Fix boolean all reduce bug ( #1355 )
2024-08-24 10:09:32 -07:00
Nripesh Niketan
64bec4fad7
Chore: update pre-commit hooks ( #1353 )
...
* Chore: update pre-commit refs
* run pre-commit
2024-08-24 06:46:36 -07:00
Alex Barron
b96e105244
Add grid_sample
example to metal_kernel
docs ( #1352 )
...
* Add `zero_outputs` and `atomic_outputs` options to `metal_kernel`
* add grid sample to docs
* zero_outputs -> init_value
* add missing header for linux
2024-08-23 18:24:16 -07:00
Angelos Katharopoulos
b57a52813b
Further reduction tuning ( #1349 )
...
* More reduction tuning
* Forgotten pdb
* Small column long row specialization
2024-08-23 10:35:25 -07:00
Awni Hannun
98b6ce3460
Refactor reductions and fix scatter atomics for large sizes ( #1300 )
...
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com >
2024-08-22 16:03:31 -07:00
Alex Barron
0fd2a1f4b0
Custom Metal Kernels from Python ( #1325 )
...
* start
* simple kernels working
* restructure
* inverse example working
* docs + fixes
* missing file
* fix imports
* address comments
* add docs + fix test
* Review comments + refactor to a single function
* update docs
* remove hashing
* fix contig bug in test
* back to a class
* trailing whitespace
* fix tests
* match c++ and python apis
* add link + make args kw_only
2024-08-22 13:46:29 -07:00
Awni Hannun
df3233454d
2d gather specialization ( #1339 )
2024-08-22 10:48:24 -07:00
Awni Hannun
d40e76809f
Fix rope ( #1340 )
...
* add test
* fix rope
* fix test
2024-08-20 17:37:52 -07:00