Awni Hannun
|
1e496ddb82
|
[CUDA] Simplify allocator (#2392)
* simplify allocator and fixe race with small pool
* Don't use shared event in worker
* use cuda buffer in small pool
* comment
* comment
|
2025-07-22 08:24:01 -07:00 |
|
Awni Hannun
|
93d70419e7
|
[CUDA] speedup handling scalars (#2389)
* speedup scalars in cuda
* comment
|
2025-07-18 21:47:31 -07:00 |
|
Cheng
|
5685ceb3c7
|
Avoid invoking allocator::malloc when creating CUDA event (#2232)
|
2025-06-03 16:48:40 -07:00 |
|
Cheng
|
db5a7c6192
|
Add memory cache to CUDA backend (#2221)
* Move BufferCache out of allocator
* Add memory cache to cuda backend allocator
* Simplify BufferCache assuming buf can not be null
|
2025-05-30 12:12:54 -07:00 |
|
Cheng
|
0cae0bdac8
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |
|