mlx/mlx/backend/cuda
Awni Hannun cad5c0241c
[CUDA] synch properly waits for all tasks to finish and clear (#2303)
* cuda synch properly waits for all tasks to finish and clear

* fix copy
2025-06-17 12:03:25 -07:00
..
copy [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
device divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
iterators CUDA backend: argreduce (#2270) 2025-06-11 13:26:17 -07:00
reduce CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
allocator.cpp [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
allocator.h Avoid invoking allocator::malloc when creating CUDA event (#2232) 2025-06-03 16:48:40 -07:00
arg_reduce.cu Fix cuda arg reduce (#2291) 2025-06-14 17:54:00 -07:00
bin2h.cmake CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
binary_two.cu divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
binary.cu divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
CMakeLists.txt divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
compiled.cpp Cuda bug fixes 2 (#2298) 2025-06-16 13:14:46 -07:00
copy.cu [CUDA] Fix back-end bugs and enable corresponding tests (#2296) 2025-06-16 08:45:40 -07:00
cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
cuda.h start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
device.cpp [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
device.h [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
eval.cpp [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
event.cu Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
event.h CUDA backend: backbone (#2075) 2025-05-06 21:26:46 -07:00
fence.cpp Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
indexing.cpp Cuda bug fixes 2 (#2298) 2025-06-16 13:14:46 -07:00
jit_module.cpp fix cuda jit (#2287) 2025-06-13 19:21:46 -07:00
jit_module.h CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
kernel_utils.cu RoPE for CUDA (#2293) 2025-06-15 06:08:07 -07:00
kernel_utils.cuh [CUDA] Fix back-end bugs and enable corresponding tests (#2296) 2025-06-16 08:45:40 -07:00
layer_norm.cu [CUDA] RMSNorm and VJP (#2280) 2025-06-12 17:09:49 -07:00
logsumexp.cu CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
matmul.cpp Fix cuda arg reduce (#2291) 2025-06-14 17:54:00 -07:00
no_cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
primitives.cu divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
random.cu RoPE for CUDA (#2293) 2025-06-15 06:08:07 -07:00
reduce.cu CUDA backend: reduce (#2269) 2025-06-11 11:22:25 -07:00
rms_norm.cu [CUDA] RMSNorm and VJP (#2280) 2025-06-12 17:09:49 -07:00
rope.cu RoPE for CUDA (#2293) 2025-06-15 06:08:07 -07:00
slicing.cpp rebase + nit (#2260) 2025-06-10 10:51:51 -07:00
softmax.cu CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
sort.cu divmod, partition, sort fixes (#2302) 2025-06-16 18:49:32 -07:00
ternary.cu Cuda bug fixes 2 (#2298) 2025-06-16 13:14:46 -07:00
unary.cu Cuda bug fixes 2 (#2298) 2025-06-16 13:14:46 -07:00
utils.cpp Cuda bug fixes 2 (#2298) 2025-06-16 13:14:46 -07:00
utils.h CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
worker.cpp [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
worker.h CUDA backend: backbone (#2075) 2025-05-06 21:26:46 -07:00