Andrey Portnoy
5722c147de
[CUDA] Update calls to cudaMemAdvise and cudaGraphAddDependencies for CUDA 13 ( #2525 )
...
* [CUDA] Update cudaMemAdvise and cudaGraphAddDependencies for CUDA 13
These functions' signatures changed in CUDA 13, so we differentiate
between CUDA 13 and preceding releases at compile time.
* Mention NVIDIA in ACKNOWLEDGMENTS.md
2025-08-21 19:57:20 -07:00
Awni Hannun
1e496ddb82
[CUDA] Simplify allocator ( #2392 )
...
* simplify allocator and fixe race with small pool
* Don't use shared event in worker
* use cuda buffer in small pool
* comment
* comment
2025-07-22 08:24:01 -07:00
Awni Hannun
93d70419e7
[CUDA] speedup handling scalars ( #2389 )
...
* speedup scalars in cuda
* comment
2025-07-18 21:47:31 -07:00
Awni Hannun
c9a9180584
Cuda perf tuning ( #2307 )
...
* perf tuning
* fix adding inputs arrays in matmul / srot
* format
* fix
2025-06-20 14:50:57 -07:00
Awni Hannun
cad5c0241c
[CUDA] synch properly waits for all tasks to finish and clear ( #2303 )
...
* cuda synch properly waits for all tasks to finish and clear
* fix copy
2025-06-17 12:03:25 -07:00
Cheng
5685ceb3c7
Avoid invoking allocator::malloc when creating CUDA event ( #2232 )
2025-06-03 16:48:40 -07:00
Cheng
db5a7c6192
Add memory cache to CUDA backend ( #2221 )
...
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
2025-05-30 12:12:54 -07:00
Cheng
0cae0bdac8
CUDA backend: backbone ( #2075 )
2025-05-06 21:26:46 -07:00