Files
mlx/mlx/fence.h
Awni Hannun df58b4133a
Some checks failed
Nightly Build / build_linux_release (3.10) (push) Has been cancelled
Nightly Build / build_linux_release (3.14) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.10) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.11) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.12) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.13) (push) Has been cancelled
Nightly Build / build_linux_with_tests (3.14) (push) Has been cancelled
Nightly Build / build_mac_release (3.10) (push) Has been cancelled
Nightly Build / build_mac_release (3.13) (push) Has been cancelled
Nightly Build / build_cuda_with_tests (push) Has been cancelled
Nightly Build / build_cuda_release (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (aarch64) (push) Has been cancelled
Nightly Build / Linux Fedora CPP Build (x86_64) (push) Has been cancelled
[CUDA] Reduce use of managed memory (#2725)
* Use async cuda malloc managed with cuda 13

* add pool threshold

* refactor for regular cuda malloc

* load eval gpu for cuda

* remove use of cuda pool, use cuda free async

* fix

* fix

* fix

* fix

* fix + comment
2025-11-05 16:05:23 -08:00

40 lines
1.1 KiB
C++

// Copyright © 2024 Apple Inc.
#include <vector>
#include "mlx/array.h"
namespace mlx::core {
/* A fence to be used for synchronizing work between streams.
*
* Calls to `wait` wait in the given stream until all previous calls to update
* are complete on their given stream.
*
* The array passed to `update` is computed and visible after the call to
* `wait` returns. The array passed to `wait` will not be read until all
* previous calls to `update` have completed.
*
* Note, calls to `update` should always be from the same thread or explicitly
* synchronized so that they occur in sequence. Calls to `wait` can be on any
* thread.
*
* For the Metal back-end the fence supports slow (default) and fast mode.
* Fast mode requires setting the environment variable
* `MLX_METAL_FAST_SYNCH=1`. Fast mode also requires Metal 3.2+ (macOS 15+,
* iOS 18+).
*/
class Fence {
public:
Fence() {};
explicit Fence(Stream stream);
void update(Stream stream, const array& x, bool cross_device);
void wait(Stream stream, const array& x);
private:
std::shared_ptr<void> fence_{nullptr};
};
} // namespace mlx::core