[CUDA] Simplify allocator (#2392)

* simplify allocator and fixe race with small pool * Don't use shared event in worker * use cuda buffer in small pool * comment * comment
2025-10-21 18:28:11 +08:00 · 2025-07-22 08:24:01 -07:00
parent 74eccbf3fa
commit 1e496ddb82
9 changed files with 100 additions and 162 deletions
--- a/mlx/backend/metal/allocator.cpp
+++ b/mlx/backend/metal/allocator.cpp
@@ -128,8 +128,7 @@ Buffer MetalAllocator::malloc(size_t size) {

    auto pool = metal::new_scoped_memory_pool();

-    // If we have a lot of memory pressure or are over the maximum cache size,
-    // try to reclaim memory from the cache
+    // If we have a lot of memory pressure try to reclaim memory from the cache
    if (mem_required >= gc_limit_ || num_resources_ >= resource_limit_) {
      num_resources_ -=
          buffer_cache_.release_cached_buffers(mem_required - gc_limit_);