mlx/mlx/backend/cuda
2025-07-30 13:05:51 +09:00
..
copy Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
device Move arange to its own file (#2438) 2025-07-30 13:05:51 +09:00
gemms faster rms norm (#2433) 2025-07-29 13:12:00 -07:00
reduce fix complex reduce + nan propagation in min and max (#2377) 2025-07-15 18:19:47 -07:00
allocator.cpp [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
allocator.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
arange.cu Move arange to its own file (#2438) 2025-07-30 13:05:51 +09:00
arg_reduce.cu Remove thrust iterators (#2396) 2025-07-21 07:30:27 -07:00
bin2h.cmake CUDA backend: compile (#2276) 2025-06-12 17:08:39 -07:00
binary_two.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
binary.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
CMakeLists.txt Move arange to its own file (#2438) 2025-07-30 13:05:51 +09:00
compiled.cpp Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
conv.cpp [CUDA] Initial implementation of Convolution with cuDNN (#2385) 2025-07-25 08:12:10 +09:00
copy.cu Cuda perf tuning (#2307) 2025-06-20 14:50:57 -07:00
cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
cuda.h start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
device.cpp Add more CUDA architectures for PyPi package (#2427) 2025-07-28 12:35:15 -07:00
device.h [CUDA] Initial implementation of Convolution with cuDNN (#2385) 2025-07-25 08:12:10 +09:00
eval.cpp [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
event.cu [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
event.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
fence.cpp Avoid atomic updates across CPU/GPU in CUDA event (#2231) 2025-06-03 16:49:06 -07:00
indexing.cpp Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
jit_module.cpp [CUDA] Fix segfault on exit (#2424) 2025-07-27 08:08:13 -07:00
jit_module.h [CUDA] Fix segfault on exit (#2424) 2025-07-27 08:08:13 -07:00
kernel_utils.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
kernel_utils.cuh Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
layer_norm.cu faster rms norm (#2433) 2025-07-29 13:12:00 -07:00
logsumexp.cu Cuda faster softmax (#2435) 2025-07-29 17:18:12 -07:00
lru_cache.h [CUDA] Initial implementation of Convolution with cuDNN (#2385) 2025-07-25 08:12:10 +09:00
matmul.cpp [CUDA] Always use batched matmul (#2404) 2025-07-24 20:46:02 -07:00
no_cuda.cpp start cuda circle config (#2256) 2025-06-10 21:19:47 -07:00
primitives.cpp Move arange to its own file (#2438) 2025-07-30 13:05:51 +09:00
quantized.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
random.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
reduce.cu faster rms norm (#2433) 2025-07-29 13:12:00 -07:00
rms_norm.cu faster rms norm (#2433) 2025-07-29 13:12:00 -07:00
rope.cu [CUDA] Switch to CUDA graphs (#2317) 2025-07-02 15:59:13 -07:00
scan.cu Add contiguous_copy_gpu util for copying array (#2379) 2025-07-18 06:44:25 -07:00
slicing.cpp rebase + nit (#2260) 2025-06-10 10:51:51 -07:00
softmax.cu Cuda faster softmax (#2435) 2025-07-29 17:18:12 -07:00
sort.cu Add contiguous_copy_gpu util for copying array (#2379) 2025-07-18 06:44:25 -07:00
ternary.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
unary.cu Remove the kernel arg from get_launch_args (#2437) 2025-07-30 11:43:02 +09:00
utils.cpp [CUDA] Initial implementation of Convolution with cuDNN (#2385) 2025-07-25 08:12:10 +09:00
utils.h [CUDA] Initial implementation of Convolution with cuDNN (#2385) 2025-07-25 08:12:10 +09:00
worker.cpp [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00
worker.h [CUDA] Simplify allocator (#2392) 2025-07-22 08:24:01 -07:00