Alex Barron
d0da74209b
version bump ( #1260 )
2024-07-11 11:17:55 -07:00
Angelos Katharopoulos
5c1fa64fb0
Custom transforms ( #1246 )
2024-07-10 18:00:01 -07:00
Alex Barron
a3c287354f
Fast Hadamard Transform ( #1249 )
...
* Working hadamard for powers of 2
* working for m*2^k
* add scale and check contiguity
* add size check
* clean up
* fix test
* add grads + vmap
* gpu only
* skip on linux
* test typo
* add cpu impl
* remove gpu only tests
* fix linux build + add is_equivalent
2024-07-09 20:39:01 -07:00
Angelos Katharopoulos
03cf033f82
Fix reshape copy bug ( #1253 )
2024-07-07 21:37:00 -07:00
Alex Barron
bdb36c9a63
add zero vjps for bitwise ops and gather w.r.t. index ( #1256 )
2024-07-07 21:34:59 -07:00
Awni Hannun
20bb301195
CPU binary reduction + Nits ( #1242 )
...
* very minor nits
* reduce binary
* fix test
2024-06-28 13:50:42 -07:00
Awni Hannun
d6383a1c6a
version bump ( #1239 )
2024-06-27 10:43:13 -07:00
Angelos Katharopoulos
b05bcfd27f
Fixes segfault when compiling checkpointed functions ( #1235 )
2024-06-26 16:14:45 -07:00
Alex Barron
2615660e62
Fix strided sort bug ( #1236 )
...
* Use output strides in sort kernel
* fix zero strides bug
2024-06-26 14:32:11 -07:00
Awni Hannun
5b0af4cdb1
fix donation condition for compilation ( #1237 )
2024-06-26 09:04:05 -07:00
Jagrit Digani
8c2e15e6c8
Accelerate import updates for iOS ( #1227 )
...
* Update veclib and bnns includes to #include <Accelerate/Accelerate.h> for compatibility with ios
* Mark float literals in softmax.cpp to be float16_t for errors in ios
* Add arm neon vector operation guards
* Redirect to common backend for consistency
2024-06-26 09:01:50 -07:00
Awni Hannun
56c8a33439
Get metal version from xcode ( #1228 )
...
* get metal version from xcode
* typo
* fix
2024-06-26 07:02:11 -07:00
David Koski
4eef1e8a3e
fix typo ( #1215 )
2024-06-24 13:36:35 -07:00
Alex Barron
95d11bda06
Fix NumPy 2.0 pickle test ( #1221 )
...
* fix numpy version <2 temporarily
* typo
* better fix
* Fix just for bfloat16
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-23 05:47:22 -07:00
Awni Hannun
af9079cc1f
version bump ( #1212 )
2024-06-14 11:28:51 -07:00
Jagrit Digani
2d6cd47713
Masked gemv ( #1211 )
2024-06-14 09:52:26 -07:00
Awni Hannun
fe3167d7ea
smaller CPU binary ( #1203 )
...
* smaller CPU binary
* fix no cpu build
2024-06-14 09:46:55 -07:00
Awni Hannun
31e134be35
Build for macOS 15 ( #1208 )
...
* Build for macos 15
* metal32 as well
* comment
---------
Co-authored-by: Awni Hannun <Awni Hannun>
2024-06-13 13:31:44 -07:00
Awni Hannun
e84ba8056d
only allow openmpi ( #1209 )
2024-06-13 12:14:44 -07:00
Fangjun Kuang
f20e97b092
minor fixes ( #1194 )
...
* minor fixes
* fix build errors
2024-06-12 22:06:49 -07:00
Alex Barron
934683088e
Refactor JIT for unary/binary/ternary ops ( #1206 )
...
* refactor unary/binary/ternary ops
* get_primitive_string util
---------
2024-06-12 14:22:12 -07:00
Awni Hannun
de2b9e7d0a
Fix kernel deps to reduce build times ( #1205 )
2024-06-12 11:17:39 -07:00
Alex Barron
dd7d8e5e29
Add Quantized Ops to the JIT ( #1204 )
...
* JIT for quantized ops
* remove unused imports
* address comments
* fix imports
* second attempt to fix imports
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-12 09:47:12 -07:00
Awni Hannun
df964132fb
fix scatter + test ( #1202 )
...
* fix scatter + test
* fix test warnings
* fix metal validation
2024-06-11 14:35:12 -07:00
Awni Hannun
709ccc6800
install mpi for release build ( #1199 )
2024-06-10 10:09:32 -07:00
Awni Hannun
cf236fc390
version ( #1191 )
2024-06-06 17:16:40 -07:00
Alex Barron
27d70c7d9d
Feature complete Metal FFT ( #1102 )
...
* feature complete metal fft
* fix contiguity bug
* jit fft
* simplify rader/bluestein constant computation
* remove kernel/utils.h dep
* remove bf16.h dep
* format
---------
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-06 12:57:25 -07:00
nicolov
0e585b4409
Add docstring for scatter ( #1189 )
...
* Add docstring for scatter
* docs nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-06 11:51:25 -07:00
Angelos Katharopoulos
0163a8e57a
Add docs for the distributed namespace ( #1184 )
2024-06-06 11:37:00 -07:00
Awni Hannun
578842954c
fix jit scan when output doesn't have primitive ( #1190 )
2024-06-06 07:24:58 -07:00
Awni Hannun
496315fe1d
Fix scan ( #1188 )
...
* fix scan
* improve grid size
* fix cpu cummax
2024-06-05 14:21:58 -07:00
Angelos Katharopoulos
0fe6895893
Fix the hard-shrink test ( #1185 )
2024-06-04 16:22:56 -07:00
Nikhil Mehta
0b7d71fd2f
Add softmin, hardshrink, hardtanh ( #1180 )
...
---------
Co-authored-by: Nikhil Mehta <nikmehta@tesla.com>
2024-06-04 15:48:18 -07:00
Awni Hannun
83b11bc58d
Fix Metal API validation for empty concat ( #1183 )
2024-06-04 13:17:08 -07:00
Alex Barron
375a8bbdcc
Add some internal GPU apis ( #1177 )
...
* Add unary/binary/ternay/slice/concat internal GPU ops
* add pad internal op
* formatting + no_cpu fix
2024-06-04 09:24:26 -07:00
Awni Hannun
ea9090bbc4
Add view op ( #1179 )
...
* add view primitive
* nit
* fix view
2024-06-04 08:05:27 -07:00
nicolov
81def6ac76
Fix benchmark ( #1175 )
2024-06-04 07:50:46 -07:00
Angelos Katharopoulos
3de8ce3f3c
In place all-reduce and forgiving init ( #1178 )
2024-06-03 16:47:47 -07:00
Alex Barron
4d485fca24
Add defines include ( #1176 )
...
Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-03 09:50:10 -07:00
Brian Keene
1865299a30
Metal shaders for memory efficient self attention on large sequences ( #964 )
...
* Metal shaders for efficient self attention on large sequences
Updated fast attention: GEMM-ified with Steel primitives
Uses flash attention 1 for scale correction
* more compiler silencing
* Address rebase issues
* Templatize kernel instantiation, revise cpu bindings
* Safer writes to output
* Permit batch size > 1
* Numerical fixes for sdpa self attention
* Re-enable test, remove unused variable
* add benchmarking script
* Disable sdpa prior to perf tuning, and simplify tests for per-patch CI
2024-06-03 09:16:19 -07:00
Dominik Schlösser
3576b547c5
Doc error for default for scale in SinusoidalPositionalEncoding ( #1174 )
2024-06-02 13:42:45 -07:00
Awni Hannun
079882495d
version bump ( #1172 )
2024-05-31 12:29:12 -07:00
K Venkat Ramnan
ab977109db
feat: Added dlpack device ( #1165 )
...
* feat: Added dlpack device
* feat: Added device_id to dlpack device
* feat: Added device_id to dlpack device
* doc: updated conversion docs
* doc: updated numpy.rst dlpack information
* doc: updated numpy.rst dlpack information
* Update docs/src/usage/numpy.rst
* Update docs/src/usage/numpy.rst
---------
Co-authored-by: Venkat Ramnan Kalyanakumar <venkatramnankalyanakumar@Venkats-MacBook-Air.local>
Co-authored-by: Awni Hannun <awni.hannun@gmail.com>
2024-05-31 12:29:01 -07:00
Awni Hannun
fd1c08137b
stable cumprod grad at 0 ( #1167 )
2024-05-31 12:28:42 -07:00
Jagrit Digani
76b6cece46
Fix multi-block sort stride management ( #1169 )
...
* Fix multi-block sort stride management
* Add seed to tests
2024-05-31 11:10:54 -07:00
Jagrit Digani
9f0df51f8d
Fix matvec vector stride bug ( #1168 )
2024-05-29 12:18:28 -07:00
Awni Hannun
e7a2a3dcd1
Fix a couple bugs ( #1161 )
...
* fix jit reduce for RMS norm
* make strides a single buffer
* better eval error message
* fix compiling with inf and bf16
* fix cpu compile with bf16
2024-05-28 15:18:18 -07:00
Awni Hannun
a87ef5bfc1
fix broadcast bug in bitwise ops ( #1157 )
2024-05-24 11:44:40 -07:00
Awni Hannun
9f9cb7a2ef
version bump ( #1154 )
2024-05-23 18:08:08 -07:00
Awni Hannun
7e26fd8032
Option to JIT steel gemm / conv ( #1139 )
2024-05-23 18:07:34 -07:00