Awni Hannun
1bf605d56d
use arch specific targets when possible ( #2771 )
2025-11-14 20:04:18 -08:00
Angelos Katharopoulos
0073096dd1
Split name into directories for cuda jit ( #2656 )
2025-10-07 01:52:58 -07:00
Awni Hannun
dc371ae7a5
fix for max block dim ( #2631 )
2025-09-29 08:59:25 -07:00
Cheng
a9bac3d9e5
Run CPP tests for CUDA build in CI ( #2544 )
2025-08-27 08:06:46 +09:00
Angelos Katharopoulos
e397177f6e
Custom cuda kernel ( #2517 )
2025-08-20 17:20:22 -07:00
Awni Hannun
b9e88fb976
[CUDA] Fix segfault on exit ( #2424 )
...
* fix cuda segfault on exit
* comment
2025-07-27 08:08:13 -07:00
Cheng
31fc530c76
[CUDA] Add more ways finding CCCL headers in JIT ( #2382 )
2025-07-17 15:25:34 -07:00
Angelos Katharopoulos
6b1b8ea91b
[CUDA] Add work per thread to compile ( #2368 )
2025-07-17 06:47:52 -07:00
Cheng
cb349a291c
[CUDA] Use cuda::std::complex in place of cuComplex ( #2372 )
2025-07-15 00:36:13 -07:00
Cheng
6325f60d52
[CUDA] Bundle CCCL for JIT compilation ( #2357 )
...
* Ship CCCL for JIT compilation
* Remove cexpf
2025-07-11 18:45:37 -07:00
Cheng
8347575ba1
[CUDA] Implement Scan kernel ( #2347 )
...
* Contiguous scan
* Strided scan
* Enable tests
* Fix failing logaddexp test
* Use cexpf in Metal
2025-07-10 16:54:12 -07:00
Cheng
afb9817599
[CUDA] Put version in ptx cache dir path ( #2352 )
2025-07-10 07:24:21 -07:00
Awni Hannun
ec0d5db67b
[CUDA] Switch to CUDA graphs ( #2317 )
...
* cuda graph prototype
fix signal bug + start to add dependencies
capture more
capture more ops
remaining ops
fix reduce and rope deps
add concurrent context
try update, but not working
cosistent topology order
use node api
use node api directly to reduce overhead
fix bug
use kernels in unary
cache graph
format
fix synchronization
format
* comment
2025-07-02 15:59:13 -07:00
Angelos Katharopoulos
b3d7b85376
Make ptx cache settable by environment variable ( #2304 )
2025-06-17 23:55:56 -07:00
Awni Hannun
6871e2eeb7
fix cuda jit ( #2287 )
2025-06-13 19:21:46 -07:00
Cheng
c8b4787e4e
CUDA backend: indexing ops ( #2277 )
2025-06-12 21:44:19 -07:00
Cheng
a4fc671d3e
CUDA backend: compile ( #2276 )
...
* CUDA backend: compile
* Rename kernels/ to device/
2025-06-12 17:08:39 -07:00