Awni Hannun
bb1b76d9dc
RoPE with frequencies as optional input ( #1337 )
...
* start rope with freq input
* rope with frequencies
* nits
* fix bug
* fix bug + test
* cleanup
* optional base
2024-08-19 18:30:50 -07:00
Awni Hannun
eaaea02010
Add isfinite
( #1318 )
...
* isfinite
* remove reduce test since fix is not complete
2024-08-13 14:49:28 -07:00
Brian Keene
19fb69e2ed
Add memory_efficient_threshold kwarg to sdpa kernel ( #1319 )
...
Allows opt-in to memory efficient GPU shader at proscribed sequence
length. Otherwise, utilizes aggregate MLX primitives for best latency.
2024-08-12 12:57:09 -07:00
Awni Hannun
9231617eb3
Move to nanobind v2 ( #1316 )
2024-08-08 17:17:46 -07:00
Alex Barron
32668a7317
CPU mx.linalg.cholesky_inverse and mx.linalg.tri_inv ( #1307 )
...
* add cholesky inv + tri inv
* always run tri_inv on cpu
* consistent naming
2024-08-08 15:18:02 -07:00
Alex Barron
635ccd9e25
Add "edge" mode to mx.pad ( #1309 )
...
* Add edge padding mode
* fix pad in pooling
* string arg instead of enum
2024-08-06 11:23:10 -07:00
Awni Hannun
10b5835501
fix creating array from bf16 tensors in jax / torch ( #1305 )
2024-08-01 16:20:51 -07:00
Awni Hannun
40b6d67333
Fixes for large arrays with a few ops ( #1299 )
...
* fixes for large arrays with a few ops
* fix bug
* fix all of copy
2024-07-30 17:18:39 -07:00
Alex Barron
c52d1600f0
Fused Affine Quantize/Dequantize ops ( #1282 )
...
* Add fast affine dequantize
* add full quantize kernel
* fused kernel with scale/bias computation
* fix docstring
* fix no jit error
* fix test
* test fix
* reduce fast api to only affine_quantize
2024-07-29 15:11:38 -07:00
Awni Hannun
aa1d6cadad
Fix docs latex build and nits ( #1297 )
...
* fix docs latex build and nits
* fix stub gen and try to clean up building
2024-07-29 11:44:06 -07:00
Awni Hannun
7b456fd2c0
Array api ( #1289 )
...
* some updates for numpy 2.0 and array api
* some updates for numpy 2.0 and array api
* fix array api doc
2024-07-26 10:40:49 -07:00
Anton Belov
5029894662
[Issue #1187 ] Add nan_to_num function initial attempt ( #1247 )
...
* initial attempt, working with wrong types
* not compiling; mx.float16 and mx.bfloat16 tests added
* fix nan to num
* nit
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-07-25 09:57:37 -07:00
Awni Hannun
baf9fa5f42
Einsum ( #1269 )
...
* einsum initial
* fix comma break
* sum axis was wrong
* small cleanups
* python binding
* changed bindings to resemble numpy
* remove todo comment
* comment changes
* add count of operands/inputs
* fail fast if operands list is empty
* ignore comma if no output
* einsum path matching numpy
* getting somewhere with path
* remove print
* it passes the first test
* moved einsum tests to seperate file
* seperated einsum path
* moved einsum naive
* remove space from equation
* fast fail if no operands passed
* update tests and remove printf
* small cleanup
* some more cleanups
* removed python helper file
* ack
* utilize std for finding min in vector
* duplicate def
* remove the tuple as it was unreadable
* moved einsum_naive back to ops
* remaining isn't needed
* avoid creating another set
* cleanup
* greedy path, start of naive einsum
* more einsum
* fix some bugs
* some more fixes, tests pass
* benchmark
* some simplify
* fix einsum and test
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
* add a bunch more tests and fix a bunch more bugs
* some docs nits
---------
Co-authored-by: dc-dc-dc <dgcruz983@gmail.com>
Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
2024-07-25 09:36:44 -07:00
fgranqvist
50eff6a10a
Implement sampling from laplace distribution. ( #1279 )
2024-07-24 15:15:37 +02:00
Alex Barron
c34a5ae7f7
Fix bfloat16 Hadamard ( #1283 )
...
* fix bfloat16 hadamard
* add scale
* review comments
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-07-23 14:54:43 -07:00
Awni Hannun
e2aa6ec8ae
some fixes ( #1281 )
2024-07-23 11:49:05 -07:00
Awni Hannun
218047c75a
docs fixes ( #1263 )
2024-07-11 15:59:07 -07:00
Angelos Katharopoulos
5c1fa64fb0
Custom transforms ( #1246 )
2024-07-10 18:00:01 -07:00
Alex Barron
a3c287354f
Fast Hadamard Transform ( #1249 )
...
* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent
2024-07-09 20:39:01 -07:00
Awni Hannun
20bb301195
CPU binary reduction + Nits ( #1242 )
...
* very minor nits
* reduce binary
* fix test
2024-06-28 13:50:42 -07:00
Angelos Katharopoulos
b05bcfd27f
Fixes segfault when compiling checkpointed functions ( #1235 )
2024-06-26 16:14:45 -07:00
David Koski
4eef1e8a3e
fix typo ( #1215 )
2024-06-24 13:36:35 -07:00
Alex Barron
95d11bda06
Fix NumPy 2.0 pickle test ( #1221 )
...
* fix numpy version <2 temporarily
* typo
* better fix
* Fix just for bfloat16
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-23 05:47:22 -07:00
Angelos Katharopoulos
0163a8e57a
Add docs for the distributed namespace ( #1184 )
2024-06-06 11:37:00 -07:00
Awni Hannun
ea9090bbc4
Add view op ( #1179 )
...
* add view primitive
* nit
* fix view
2024-06-04 08:05:27 -07:00
Angelos Katharopoulos
3de8ce3f3c
In place all-reduce and forgiving init ( #1178 )
2024-06-03 16:47:47 -07:00
Brian Keene
1865299a30
Metal shaders for memory efficient self attention on large sequences ( #964 )
...
* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00
K Venkat Ramnan
ab977109db
feat: Added dlpack device ( #1165 )
...
* feat: Added dlpack device
* feat: Added device_id to dlpack device
* feat: Added device_id to dlpack device
* doc: updated conversion docs
* doc: updated numpy.rst dlpack information
* doc: updated numpy.rst dlpack information
* Update docs/src/usage/numpy.rst
* Update docs/src/usage/numpy.rst
---------
Co-authored-by: Venkat Ramnan Kalyanakumar <venkatramnankalyanakumar@Venkats-MacBook-Air.local>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-05-31 12:29:01 -07:00
Awni Hannun
7e26fd8032
Option to JIT steel gemm / conv ( #1139 )
2024-05-23 18:07:34 -07:00
Angelos Katharopoulos
50dfb664db
Comms ( #1097 )
...
* Start the communications branch using MPI
* Add ops and primitives
* Add python bindings for distributed
2024-05-23 17:04:02 -07:00
Awni Hannun
eb8321d863
list based indexing ( #1150 )
2024-05-22 15:52:05 -07:00
Abe Leininger
79ef49b2c2
add mx.trace ( #1143 ) ( #1147 )
...
* working c++ trace implementation
* updated throw + added overloads
* added python binding for trace function
* pre-commit reformatting
* add trace to docs
* resolve comments
* remove to_stream call
2024-05-22 15:50:27 -07:00
Awni Hannun
d568c7ee36
Rename block sparse ( #1149 )
...
* block_sparse_mm to gather_mm
* rename
* nit
* nit
2024-05-22 07:48:34 -07:00
Awni Hannun
e6fecbb3e1
Some fixes in docs ( #1141 )
...
* fixes in docs
* nit
2024-05-20 11:51:47 -07:00
Luca Arnaboldi
b3ec792380
Implemented Cholesky on CPU ( #1119 )
2024-05-17 12:31:59 -07:00
Awni Hannun
81dd33af66
allow conversion to dlpack ( #1120 )
2024-05-16 16:11:37 -07:00
Angelos Katharopoulos
e78a6518fa
Block sparse qmm ( #1124 )
2024-05-16 15:24:14 -07:00
Jacket
c417e42116
[Fix] minor typo in default argument for argpartition's "axis" parameter ( #1125 )
...
According to the document, argpartition's axis parameter can be None, but due to a previous typo it can't really accepts a None value.
2024-05-15 15:25:25 -07:00
Awni Hannun
631dfbe673
fix scatter index bug ( #1122 )
2024-05-14 15:04:58 -07:00
Cheng
56a4eaed72
Pass missing stream arg in array.flatten ( #1111 )
2024-05-14 06:50:16 -07:00
Cheng
bf925d9dc7
Move args in conv_general ( #1118 )
...
Also fix a typo that padding_lo is passed as padding_hi.
2024-05-14 06:50:09 -07:00
Cheng
1a7ed5dcb6
Fill vector with constructor instead of fill_n ( #1113 )
2024-05-14 06:28:55 -07:00
Cheng
cbd5445ea7
The tile op does not accept None as reps ( #1117 )
2024-05-14 06:25:25 -07:00
Max-Heinrich Laves
ff4223904d
Conv3d ( #993 )
...
* added conv3d
added conv3d
implemented explicit_gemm_conv_ND_cpu and bounds checks for slow_conv_3D
* incorporated reviewer comments
* fixed test
* reduced tensor shapes in test for conv3d
* Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
Reviewer suggestion
2024-05-11 06:15:02 -07:00
Alex Barron
2e158cf6d0
Add conjugate operator ( #1100 )
...
* cpu and gpu impl
* add mx.conj and array.conj()
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-05-10 07:22:20 -07:00
Awni Hannun
b21242faf1
Allow unary ops to accept array like ( #1093 )
2024-05-09 09:36:02 -07:00
Rahul Yedida
cc05a281c4
Added ArcTan2 operation ( #1079 )
...
* Added ArcTan2 operation
* Cleanup, bug fixes from code review
* Minor cleanup, fixed Linux tests
2024-05-08 08:35:15 -07:00
Awni Hannun
9814a2ae12
fix conversion to array ( #1070 )
2024-05-06 16:02:49 -07:00
Shubham
6992498e7a
add keyword positonal ( #1081 )
2024-05-06 07:18:49 -07:00
Awni Hannun
21623156a3
Reset peak memory ( #1074 )
...
* reset peak memory
* fix linux
* nits in docs
2024-05-03 17:12:51 -07:00