Awni Hannun
ec0d5db67b
[CUDA] Switch to CUDA graphs ( #2317 )
...
* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment
2025-07-02 15:59:13 -07:00
Awni Hannun
cfb6a244ea
allow parameters to be deleted ( #2325 )
2025-07-01 21:27:23 -07:00
Awni Hannun
dd4f53db63
use fp32 for testing, add more complex ops ( #2322 )
2025-07-01 07:30:00 -07:00
Awni Hannun
33bf1a244b
Fix module update in strict mode ( #2321 )
...
* fix module update in strict mode
* allow GELU to be pickled
2025-06-29 11:12:29 -07:00
Angelos Katharopoulos
772f471ff2
[CUDA] Fix reductions ( #2314 )
2025-06-27 12:59:20 -07:00
Angelos Katharopoulos
2c11d10f8d
Split broadcast so it is always fused in compile ( #2318 )
2025-06-26 22:08:18 -07:00
Awni Hannun
81bb9a2a9e
Compile float64 functions on CPU ( #2311 )
2025-06-24 10:18:52 -07:00
Angelos Katharopoulos
5adf185f86
Fix update_modules()
when providing a subset ( #2308 )
2025-06-20 17:19:46 -07:00
Awni Hannun
cad5c0241c
[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )
...
* cuda synch properly waits for all tasks to finish and clear
* fix copy
2025-06-17 12:03:25 -07:00
Awni Hannun
b8022c578a
divmod, partition, sort fixes ( #2302 )
2025-06-16 18:49:32 -07:00
Awni Hannun
bc53f8293f
Cuda bug fixes 2 ( #2298 )
...
* more bug fixes
* more bug fixes
* format
2025-06-16 13:14:46 -07:00
Awni Hannun
c552ff2451
[CUDA] Fix back-end bugs and enable corresponding tests ( #2296 )
...
* Fix some cuda back-end bugs and enable corresponding tests
* more fixes
* enable more tests
* format
2025-06-16 08:45:40 -07:00
Awni Hannun
4fda5fbdf9
add python testing for cuda with ability to skip list of tests ( #2295 )
2025-06-15 10:56:48 -07:00
Awni Hannun
8402a2acf4
Fix complex power and print ( #2286 )
...
* fix complex power and print
* fix complex matmul shape
2025-06-13 11:13:00 -07:00
Awni Hannun
c35f4d089a
start cuda circle config ( #2256 )
...
* rebase
* fix metal kernel linking issue on cuda
* start cuda circle config
2025-06-10 21:19:47 -07:00
Angelos Katharopoulos
8590c0941e
Add load_safe to the general conv loaders ( #2258 )
2025-06-10 20:58:16 -07:00
Awni Hannun
62fecf3e13
fix conv export ( #2265 )
2025-06-10 09:34:01 -07:00
Awni Hannun
9ce77798b1
fix export to work with gather/scatter axis ( #2263 )
2025-06-09 20:37:27 -07:00
Emmanuel Ferdman
5866b3857b
Refactor the lu test ( #2250 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-07 06:12:08 -07:00
Awni Hannun
1ca616844b
Fix unintuitive metal kernel caching ( #2242 )
...
* Fix unintuitive metal kernel caching
* alternative solution
2025-06-06 20:08:15 -07:00
Awni Hannun
c763fe1be0
default strict mode for module update and update_modules ( #2239 )
2025-06-05 15:27:02 -07:00
Suryash Malviya
0408ba0a76
Optimizing Complex Matrix Multiplication using Karatsuba’s Algorithm ( #2220 )
...
* Implementing Complex Matmul using Karatsuba Algorithm
* Implemented Karatsuba's Algorithm for complex matmul and pre-commit them
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-06-02 15:58:46 -07:00
Awni Hannun
6ef2f67e7f
5bit quants ( #2226 )
...
* 5bit quants
* 5bit quants
2025-05-30 12:12:10 -07:00
Angelos Katharopoulos
0359bf02c9
Nearest upsample ( #2202 )
2025-05-19 11:23:38 -07:00
Awni Hannun
8576e6fe36
fix conv2d bug + faster conv 1d ( #2195 )
...
* fix conv2d bug + faster conv 1d
* revert sort + flaky test
2025-05-18 06:05:11 -07:00
Angelos Katharopoulos
0654543dcc
Add complex eigh ( #2191 )
2025-05-18 00:18:43 -07:00
Awni Hannun
602f43e3d1
fix conv grad ( #2187 )
2025-05-15 19:20:36 -07:00
Awni Hannun
a2cadb8218
real and imag properties ( #2189 )
2025-05-15 18:17:50 -07:00
Awni Hannun
c1eb9d05d9
non-symmetric eig and eigh ( #2188 )
2025-05-15 13:01:44 -07:00
Angelos Katharopoulos
cf6c939e86
Fix some complex vjps ( #2178 )
2025-05-14 23:37:12 -07:00
Angelos Katharopoulos
130df35e1b
Add random normal distribution for complex numbers ( #2182 )
2025-05-13 22:43:45 -07:00
Angelos Katharopoulos
3aa9cf3f9e
Fix put_along_axis for empty arrays ( #2181 )
2025-05-13 14:27:53 -07:00
Awni Hannun
8f3d208dce
Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )
...
* handle hadamard and addmm on empty inputs
* fix
2025-05-12 10:48:57 -07:00
ATurker
a7fae8a176
fix: conv_general differences between gpu, cpu ( #2070 )
...
* fix general_conv padding
* fix bugs
* add test
---------
Co-authored-by: Awni Hannun <awni@apple.com>
2025-05-09 10:26:52 -07:00
Awni Hannun
af705590ac
fix batched vector sdpa ( #2152 )
2025-05-05 13:13:03 -07:00
Angelos Katharopoulos
481349495b
GPU Hadamard for large N ( #1879 )
2025-05-01 17:19:17 -07:00
Awni Hannun
9daa6b003f
fix shapeless export ( #2148 )
2025-05-01 15:02:02 -07:00
Awni Hannun
aa5d84f102
Allow quant layer to be unfrozen ( #2142 )
2025-04-30 09:08:29 -07:00
Aashiq Dheeraj
bb6565ef14
add fftshift and ifftshift fft helpers ( #2135 )
...
* add fftshift and ifftshift fft helpers
* address comments
* axes have to be iterable
* fix fp error in roll + add test
---------
Co-authored-by: Aashiq Dheeraj <aashiq@aashiq-mbp-m4.local>
2025-04-29 22:13:45 -07:00
Awni Hannun
7bb063bcb3
Enable vjp for quantized scale and bias ( #2129 )
...
* Enable vjp for quantized scale and bias
* higher tol
2025-04-29 13:03:09 -07:00
Awni Hannun
fbc89e3ced
fix pinv ( #2110 )
2025-04-23 13:08:28 -07:00
Param Thakkar
600e87e03c
Added output_padding parameters in conv_transpose ( #2092 )
2025-04-23 09:26:33 -07:00
Hyunsung Lee
3836445241
Add broadcast_shapes in python API ( #2091 )
2025-04-22 18:57:39 -07:00
Yury Popov
1d2c9d6a07
Complex scan ( #2094 )
2025-04-22 18:56:28 -07:00
Awni Hannun
e8ac6bd2f5
irfft throws instead of segfaults on scalars ( #2109 )
2025-04-22 10:25:55 -07:00
Awni Hannun
fdadc4f22c
Add more complex unary ops ( #2101 )
2025-04-21 13:04:54 -07:00
Awni Hannun
79b527f45f
conv vmap ( #2102 )
2025-04-21 13:04:39 -07:00
Angelos Katharopoulos
5de6d94a90
Gather qmm batched kernel and refactoring of quantized ( #2078 )
2025-04-17 13:53:11 -07:00
Angelos Katharopoulos
99eefd2ec0
Gather mm new kernel and small refactoring ( #2040 )
2025-04-14 16:37:36 -07:00
Yury Popov
e9e268336b
LogCumSumExp ( #2069 )
2025-04-13 01:27:29 -07:00