[CUDA] Simplify allocator (#2392)

* simplify allocator and fixe race with small pool

* Don't use shared event in worker

* use cuda buffer in small pool

* comment

* comment
This commit is contained in:
Awni Hannun
2025-07-22 08:24:01 -07:00
committed by GitHub
parent 74eccbf3fa
commit 1e496ddb82
9 changed files with 100 additions and 162 deletions

View File

@@ -128,8 +128,7 @@ Buffer MetalAllocator::malloc(size_t size) {
auto pool = metal::new_scoped_memory_pool();
// If we have a lot of memory pressure or are over the maximum cache size,
// try to reclaim memory from the cache
// If we have a lot of memory pressure try to reclaim memory from the cache
if (mem_required >= gc_limit_ || num_resources_ >= resource_limit_) {
num_resources_ -=
buffer_cache_.release_cached_buffers(mem_required - gc_limit_);