mlx/mlx/backend/cuda at afb9817599ac8b3c0399274ebd7bc6d30dbb30b8 - mlx

zhangyiss/mlx

Fork 0

mirror of https://github.com/ml-explore/mlx.git synced 2025-12-16 01:49:05 +08:00

Files

History

Cheng afb9817599 [CUDA] Put version in ptx cache dir path (#2352 )

2025-07-10 07:24:21 -07:00

copy

[CUDA] Do vectorized store/load in contiguous elementwise ops (#2342 )

2025-07-09 18:48:43 -07:00

device

Fix compilation with CUDA 11 (#2331 )

2025-07-07 20:00:43 -07:00

iterators

CUDA backend: argreduce (#2270 )

2025-06-11 13:26:17 -07:00

reduce

Fix compilation with CUDA 11 (#2331 )

2025-07-07 20:00:43 -07:00

allocator.cpp

Cuda perf tuning (#2307 )

2025-06-20 14:50:57 -07:00

allocator.h

Avoid invoking allocator::malloc when creating CUDA event (#2232 )

2025-06-03 16:48:40 -07:00

arg_reduce.cu

Fix compilation with CUDA 11 (#2331 )

2025-07-07 20:00:43 -07:00

bin2h.cmake

CUDA backend: compile (#2276 )

2025-06-12 17:08:39 -07:00

binary_two.cu

[CUDA] Do vectorized store/load in contiguous elementwise ops (#2342 )

2025-07-09 18:48:43 -07:00

binary.cu

[CUDA] Do vectorized store/load in contiguous elementwise ops (#2342 )

2025-07-09 18:48:43 -07:00

CMakeLists.txt

[CUDA] Fix reductions (#2314 )

2025-06-27 12:59:20 -07:00

compiled.cpp

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

copy.cu

Cuda perf tuning (#2307 )

2025-06-20 14:50:57 -07:00

cuda.cpp

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

cuda.h

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

device.cpp

[CUDA] Set current device before cudaGraphLaunch (#2351 )

2025-07-10 07:24:02 -07:00

device.h

[CUDA] Set current device before cudaGraphLaunch (#2351 )

2025-07-10 07:24:02 -07:00

eval.cpp

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

event.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

event.h

CUDA backend: backbone (#2075 )

2025-05-06 21:26:46 -07:00

fence.cpp

Avoid atomic updates across CPU/GPU in CUDA event (#2231 )

2025-06-03 16:49:06 -07:00

indexing.cpp

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

jit_module.cpp

[CUDA] Put version in ptx cache dir path (#2352 )

2025-07-10 07:24:21 -07:00

jit_module.h

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

kernel_utils.cu

RoPE for CUDA (#2293 )

2025-06-15 06:08:07 -07:00

kernel_utils.cuh

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

layer_norm.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

logsumexp.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

matmul.cpp

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

no_cuda.cpp

start cuda circle config (#2256 )

2025-06-10 21:19:47 -07:00

primitives.cu

MoE backward improvements (#2335 )

2025-07-07 17:59:53 -07:00

random.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

reduce.cu

[CUDA] Fix reductions (#2314 )

2025-06-27 12:59:20 -07:00

rms_norm.cu

Fix compilation with CUDA 11 (#2331 )

2025-07-07 20:00:43 -07:00

rope.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

slicing.cpp

rebase + nit (#2260 )

2025-06-10 10:51:51 -07:00

softmax.cu

Fix compilation with CUDA 11 (#2331 )

2025-07-07 20:00:43 -07:00

sort.cu

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

ternary.cu

[CUDA] Do vectorized store/load in contiguous elementwise ops (#2342 )

2025-07-09 18:48:43 -07:00

unary.cu

[CUDA] Do vectorized store/load in contiguous elementwise ops (#2342 )

2025-07-09 18:48:43 -07:00

utils.cpp

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

utils.h

[CUDA] Switch to CUDA graphs (#2317 )

2025-07-02 15:59:13 -07:00

worker.cpp

[CUDA] synch properly waits for all tasks to finish and clear (#2303 )

2025-06-17 12:03:25 -07:00

worker.h

CUDA backend: backbone (#2075 )

2025-05-06 21:26:46 -07:00