* Use async cuda malloc managed with cuda 13
* add pool threshold
* refactor for regular cuda malloc
* load eval gpu for cuda
* remove use of cuda pool, use cuda free async
* fix
* fix
* fix
* fix
* fix + comment
* [CUDA] Update cudaMemAdvise and cudaGraphAddDependencies for CUDA 13
These functions' signatures changed in CUDA 13, so we differentiate
between CUDA 13 and preceding releases at compile time.
* Mention NVIDIA in ACKNOWLEDGMENTS.md