mlx/mlx/backend/cuda
2025-07-10 07:24:21 -07:00
..
copy [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) 2025-07-09 18:48:43 -07:00
device Fix compilation with CUDA 11 (#2331) 2025-07-07 20:00:43 -07:00
iterators CUDA backend: argreduce (#2270) 2025-06-11 13:26:17 -07:00
reduce Fix compilation with CUDA 11 (#2331) 2025-07-07 20:00:43 -07:00
allocator.cpp Cuda perf tuning (#2307) 2025-06-20 14:50:57 -07:00
allocator.h Avoid invoking allocator::malloc when creating CUDA event (#2232) 2025-06-03 16:48:40 -07:00
arg_reduce.cu Fix compilation with CUDA 11 (#2331) 2025-07-07 20:00:43 -07:00
bin2h.cmake CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
binary_two.cu [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) 2025-07-09 18:48:43 -07:00
binary.cu [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) 2025-07-09 18:48:43 -07:00
CMakeLists.txt [CUDA] Fix reductions (#2314) 2025-06-27 12:59:20 -07:00
compiled.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
copy.cu Cuda perf tuning (#2307) 2025-06-20 14:50:57 -07:00
cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
cuda.h start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
device.cpp [CUDA] Set current device before cudaGraphLaunch (#2351) 2025-07-10 07:24:02 -07:00
device.h [CUDA] Set current device before cudaGraphLaunch (#2351) 2025-07-10 07:24:02 -07:00
eval.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
event.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
event.h CUDA backend: backbone (#2075) 2025-05-06 21:26:46 -07:00
fence.cpp Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
indexing.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
jit_module.cpp [CUDA] Put version in ptx cache dir path (#2352) 2025-07-10 07:24:21 -07:00
jit_module.h [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
kernel_utils.cu RoPE for CUDA (#2293) 2025-06-15 06:08:07 -07:00
kernel_utils.cuh [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
layer_norm.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
logsumexp.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
matmul.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
no_cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
primitives.cu MoE backward improvements (#2335) 2025-07-07 17:59:53 -07:00
random.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
reduce.cu [CUDA] Fix reductions (#2314) 2025-06-27 12:59:20 -07:00
rms_norm.cu Fix compilation with CUDA 11 (#2331) 2025-07-07 20:00:43 -07:00
rope.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
slicing.cpp rebase + nit (#2260) 2025-06-10 10:51:51 -07:00
softmax.cu Fix compilation with CUDA 11 (#2331) 2025-07-07 20:00:43 -07:00
sort.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
ternary.cu [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) 2025-07-09 18:48:43 -07:00
unary.cu [CUDA] Do vectorized store/load in contiguous elementwise ops (#2342) 2025-07-09 18:48:43 -07:00
utils.cpp [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
utils.h [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
worker.cpp [CUDA] synch properly waits for all tasks to finish and clear (#2303) 2025-06-17 12:03:25 -07:00
worker.h CUDA backend: backbone (#2075) 2025-05-06 21:26:46 -07:00