mlx/mlx/backend/cuda at a0ae49d397252a05d8a6881bf950d9c80614d2d9 - mlx

copy

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

device

Move arange to its own file (#2438 )

2025-07-30 13:05:51 +09:00

gemms

faster rms norm (#2433 )

2025-07-29 13:12:00 -07:00

reduce

fix complex reduce + nan propagation in min and max (#2377 )

2025-07-15 18:19:47 -07:00

allocator.cpp

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

allocator.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

arange.cu

Move arange to its own file (#2438 )

2025-07-30 13:05:51 +09:00

arg_reduce.cu

Remove thrust iterators (#2396 )

2025-07-21 07:30:27 -07:00

bin2h.cmake

CUDA backend: compile (#2276 )

2025-06-12 17:08:39 -07:00

binary_two.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

binary.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

CMakeLists.txt

Move arange to its own file (#2438 )

2025-07-30 13:05:51 +09:00

compiled.cpp

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

conv.cpp

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

copy.cu

Cuda perf tuning (#2307 )

2025-06-20 14:50:57 -07:00

cuda.cpp

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

cuda.h

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

device.cpp

Add more CUDA architectures for PyPi package (#2427 )

2025-07-28 12:35:15 -07:00

device.h

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

eval.cpp

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

event.cu

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

event.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

fence.cpp

Avoid atomic updates across CPU/GPU in CUDA event (#2231 )

2025-06-03 16:49:06 -07:00

indexing.cpp

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

jit_module.cpp

[CUDA] Fix segfault on exit (#2424 )

2025-07-27 08:08:13 -07:00

jit_module.h

[CUDA] Fix segfault on exit (#2424 )

2025-07-27 08:08:13 -07:00

kernel_utils.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

kernel_utils.cuh

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

layer_norm.cu

faster rms norm (#2433 )

2025-07-29 13:12:00 -07:00

logsumexp.cu

Cuda faster softmax (#2435 )

2025-07-29 17:18:12 -07:00

lru_cache.h

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

matmul.cpp

[CUDA] Always use batched matmul (#2404 )

2025-07-24 20:46:02 -07:00

no_cuda.cpp

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

primitives.cpp

Move arange to its own file (#2438 )

2025-07-30 13:05:51 +09:00

quantized.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

random.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

reduce.cu

faster rms norm (#2433 )

2025-07-29 13:12:00 -07:00

rms_norm.cu

faster rms norm (#2433 )

2025-07-29 13:12:00 -07:00

rope.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

scan.cu

Add contiguous_copy_gpu util for copying array (#2379 )

2025-07-18 06:44:25 -07:00

slicing.cpp

rebase + nit (#2260 )

2025-06-10 10:51:51 -07:00

softmax.cu

Cuda faster softmax (#2435 )

2025-07-29 17:18:12 -07:00

sort.cu

Add contiguous_copy_gpu util for copying array (#2379 )

2025-07-18 06:44:25 -07:00

ternary.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

unary.cu

Remove the kernel arg from get_launch_args (#2437 )

2025-07-30 11:43:02 +09:00

utils.cpp

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

utils.h

[CUDA] Initial implementation of Convolution with cuDNN (#2385 )

2025-07-25 08:12:10 +09:00

worker.cpp

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00

worker.h

[CUDA] Simplify allocator (#2392 )

2025-07-22 08:24:01 -07:00