mlx/mlx/backend/cuda at 9392fc3f88b8a7c2d8b13f0f4bb76e63dacfbab6 - mlx

binary

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

conv

[CUDA] Add GEMM-based fallback convolution kernels (#2511 )

2025-08-20 10:06:22 +09:00

copy

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

device

fix power (#2523 )

2025-08-21 06:46:01 -07:00

gemms

[CUDA] Add GEMM-based fallback convolution kernels (#2511 )

2025-08-20 10:06:22 +09:00

quantized

Use SmallVector for shapes and strides (#2454 )

2025-08-05 09:41:03 +09:00

reduce

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

steel

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

unary

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

allocator.cpp

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

allocator.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

arange.cu

Move arange to its own file (#2438 )

2025-07-30 13:05:51 +09:00

arg_reduce.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

bin2h.cmake

CUDA backend: compile (#2276 )

2025-06-12 17:08:39 -07:00

binary_two.cu

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

CMakeLists.txt

NCCL backend (#2476 )

2025-08-21 11:56:15 -07:00

compiled.cpp

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

conv.cpp

[CUDA] Add GEMM-based fallback convolution kernels (#2511 )

2025-08-20 10:06:22 +09:00

copy.cu

Cuda perf tuning (#2307 )

2025-06-20 14:50:57 -07:00

cuda.cpp

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

cuda.h

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

cudnn_utils.cpp

[CUDA] Fix stride of singleton dims before passing to cuDNN (#2521 )

2025-08-21 08:55:26 +09:00

cudnn_utils.h

Split cuDNN helpers into a separate header (#2491 )

2025-08-20 09:29:28 +09:00

custom_kernel.cpp

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

device.cpp

Split cuDNN helpers into a separate header (#2491 )

2025-08-20 09:29:28 +09:00

device.h

Split cuDNN helpers into a separate header (#2491 )

2025-08-20 09:29:28 +09:00

distributed.cu

NCCL backend (#2476 )

2025-08-21 11:56:15 -07:00

eval.cpp

[CUDA] Save primitive inputs faster (#2449 )

2025-08-01 10:16:06 +09:00

event.cu

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

event.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

fence.cpp

Avoid atomic updates across CPU/GPU in CUDA event (#2231 )

2025-06-03 16:49:06 -07:00

indexing.cpp

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

jit_module.cpp

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

jit_module.h

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

kernel_utils.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

kernel_utils.cuh

Use SmallVector for shapes and strides (#2454 )

2025-08-05 09:41:03 +09:00

layer_norm.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

logsumexp.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

lru_cache.h

Use LRU cache for cuda graph (#2448 )

2025-08-02 21:28:57 +09:00

matmul.cpp

Rename cu::Matmul to CublasGemm (#2488 )

2025-08-13 09:37:40 +09:00

no_cuda.cpp

Custom cuda kernel (#2517 )

2025-08-20 17:20:22 -07:00

primitives.cpp

NCCL backend (#2476 )

2025-08-21 11:56:15 -07:00

random.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

reduce.cu

faster rms norm (#2433 )

2025-07-29 13:12:00 -07:00

rms_norm.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

rope.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

scaled_dot_product_attention.cu

Add CUDA sdpa vector (#2468 )

2025-08-06 21:40:26 -07:00

scan.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

slicing.cpp

rebase + nit (#2260 )

2025-06-10 10:51:51 -07:00

softmax.cu

[CUDA] Matmul utils initial commit (#2441 )

2025-08-01 14:22:25 -07:00

sort.cu

[CUDA] Fix conv grads with groups (#2495 )

2025-08-16 10:09:18 +09:00

ternary.cu

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

unary.cu

Faster general unary op (#2472 )

2025-08-15 15:04:12 -07:00

utils.cpp

Split cuDNN helpers into a separate header (#2491 )

2025-08-20 09:29:28 +09:00

utils.h

Split cuDNN helpers into a separate header (#2491 )

2025-08-20 09:29:28 +09:00

worker.cpp

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

worker.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00