mlx/mlx/backend/cuda
Anastasiia Filippova 9392fc3f88
NCCL backend (#2476)
2025-08-21 11:56:15 -07:00
..
binary Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
conv [CUDA] Add GEMM-based fallback convolution kernels (#2511) 2025-08-20 10:06:22 +09:00
copy Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
device fix power (#2523) 2025-08-21 06:46:01 -07:00
gemms [CUDA] Add GEMM-based fallback convolution kernels (#2511) 2025-08-20 10:06:22 +09:00
quantized Use SmallVector for shapes and strides (#2454) 2025-08-05 09:41:03 +09:00
reduce [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
steel [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
unary Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
allocator.cpp [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
allocator.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
arange.cu Move arange to its own file (#2438) 2025-07-30 13:05:51 +09:00
arg_reduce.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
bin2h.cmake CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
binary_two.cu Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
CMakeLists.txt NCCL backend (#2476) 2025-08-21 11:56:15 -07:00
compiled.cpp Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
conv.cpp [CUDA] Add GEMM-based fallback convolution kernels (#2511) 2025-08-20 10:06:22 +09:00
copy.cu Cuda perf tuning (#2307) 2025-06-20 14:50:57 -07:00
cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
cuda.h start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
cudnn_utils.cpp [CUDA] Fix stride of singleton dims before passing to cuDNN (#2521) 2025-08-21 08:55:26 +09:00
cudnn_utils.h Split cuDNN helpers into a separate header (#2491) 2025-08-20 09:29:28 +09:00
custom_kernel.cpp Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
device.cpp Split cuDNN helpers into a separate header (#2491) 2025-08-20 09:29:28 +09:00
device.h Split cuDNN helpers into a separate header (#2491) 2025-08-20 09:29:28 +09:00
distributed.cu NCCL backend (#2476) 2025-08-21 11:56:15 -07:00
eval.cpp [CUDA] Save primitive inputs faster (#2449) 2025-08-01 10:16:06 +09:00
event.cu [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
event.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
fence.cpp Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
indexing.cpp Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
jit_module.cpp Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
jit_module.h Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
kernel_utils.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
kernel_utils.cuh Use SmallVector for shapes and strides (#2454) 2025-08-05 09:41:03 +09:00
layer_norm.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
logsumexp.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
lru_cache.h Use LRU cache for cuda graph (#2448) 2025-08-02 21:28:57 +09:00
matmul.cpp Rename cu::Matmul to CublasGemm (#2488) 2025-08-13 09:37:40 +09:00
no_cuda.cpp Custom cuda kernel (#2517) 2025-08-20 17:20:22 -07:00
primitives.cpp NCCL backend (#2476) 2025-08-21 11:56:15 -07:00
random.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
reduce.cu faster rms norm (#2433) 2025-07-29 13:12:00 -07:00
rms_norm.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
rope.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
scaled_dot_product_attention.cu Add CUDA sdpa vector (#2468) 2025-08-06 21:40:26 -07:00
scan.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
slicing.cpp rebase + nit (#2260) 2025-06-10 10:51:51 -07:00
softmax.cu [CUDA] Matmul utils initial commit (#2441) 2025-08-01 14:22:25 -07:00
sort.cu [CUDA] Fix conv grads with groups (#2495) 2025-08-16 10:09:18 +09:00
ternary.cu Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
unary.cu Faster general unary op (#2472) 2025-08-15 15:04:12 -07:00
utils.cpp Split cuDNN helpers into a separate header (#2491) 2025-08-20 09:29:28 +09:00
utils.h Split cuDNN helpers into a separate header (#2491) 2025-08-20 09:29:28 +09:00
worker.cpp [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
worker.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00