.. |
binary
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
conv
|
[CUDA] Add GEMM-based fallback convolution kernels (#2511)
|
2025-08-20 10:06:22 +09:00 |
copy
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
device
|
fix power (#2523)
|
2025-08-21 06:46:01 -07:00 |
gemms
|
[CUDA] Add GEMM-based fallback convolution kernels (#2511)
|
2025-08-20 10:06:22 +09:00 |
quantized
|
Use SmallVector for shapes and strides (#2454)
|
2025-08-05 09:41:03 +09:00 |
reduce
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
steel
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
unary
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
allocator.cpp
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |
allocator.h
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |
arange.cu
|
Move arange to its own file (#2438)
|
2025-07-30 13:05:51 +09:00 |
arg_reduce.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
bin2h.cmake
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
binary_two.cu
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
CMakeLists.txt
|
NCCL backend (#2476)
|
2025-08-21 11:56:15 -07:00 |
compiled.cpp
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
conv.cpp
|
[CUDA] Add GEMM-based fallback convolution kernels (#2511)
|
2025-08-20 10:06:22 +09:00 |
copy.cu
|
Cuda perf tuning (#2307)
|
2025-06-20 14:50:57 -07:00 |
cuda.cpp
|
start cuda circle config (#2256)
|
2025-06-10 21:19:47 -07:00 |
cuda.h
|
start cuda circle config (#2256)
|
2025-06-10 21:19:47 -07:00 |
cudnn_utils.cpp
|
[CUDA] Fix stride of singleton dims before passing to cuDNN (#2521)
|
2025-08-21 08:55:26 +09:00 |
cudnn_utils.h
|
Split cuDNN helpers into a separate header (#2491)
|
2025-08-20 09:29:28 +09:00 |
custom_kernel.cpp
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
device.cpp
|
Split cuDNN helpers into a separate header (#2491)
|
2025-08-20 09:29:28 +09:00 |
device.h
|
Split cuDNN helpers into a separate header (#2491)
|
2025-08-20 09:29:28 +09:00 |
distributed.cu
|
NCCL backend (#2476)
|
2025-08-21 11:56:15 -07:00 |
eval.cpp
|
[CUDA] Save primitive inputs faster (#2449)
|
2025-08-01 10:16:06 +09:00 |
event.cu
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |
event.h
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |
fence.cpp
|
Avoid atomic updates across CPU/GPU in CUDA event (#2231)
|
2025-06-03 16:49:06 -07:00 |
indexing.cpp
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
jit_module.cpp
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
jit_module.h
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
kernel_utils.cu
|
Remove the kernel arg from get_launch_args (#2437)
|
2025-07-30 11:43:02 +09:00 |
kernel_utils.cuh
|
Use SmallVector for shapes and strides (#2454)
|
2025-08-05 09:41:03 +09:00 |
layer_norm.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
logsumexp.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
lru_cache.h
|
Use LRU cache for cuda graph (#2448)
|
2025-08-02 21:28:57 +09:00 |
matmul.cpp
|
Rename cu::Matmul to CublasGemm (#2488)
|
2025-08-13 09:37:40 +09:00 |
no_cuda.cpp
|
Custom cuda kernel (#2517)
|
2025-08-20 17:20:22 -07:00 |
primitives.cpp
|
NCCL backend (#2476)
|
2025-08-21 11:56:15 -07:00 |
random.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
reduce.cu
|
faster rms norm (#2433)
|
2025-07-29 13:12:00 -07:00 |
rms_norm.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
rope.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
scaled_dot_product_attention.cu
|
Add CUDA sdpa vector (#2468)
|
2025-08-06 21:40:26 -07:00 |
scan.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
slicing.cpp
|
rebase + nit (#2260)
|
2025-06-10 10:51:51 -07:00 |
softmax.cu
|
[CUDA] Matmul utils initial commit (#2441)
|
2025-08-01 14:22:25 -07:00 |
sort.cu
|
[CUDA] Fix conv grads with groups (#2495)
|
2025-08-16 10:09:18 +09:00 |
ternary.cu
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
unary.cu
|
Faster general unary op (#2472)
|
2025-08-15 15:04:12 -07:00 |
utils.cpp
|
Split cuDNN helpers into a separate header (#2491)
|
2025-08-20 09:29:28 +09:00 |
utils.h
|
Split cuDNN helpers into a separate header (#2491)
|
2025-08-20 09:29:28 +09:00 |
worker.cpp
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |
worker.h
|
[CUDA] Simplify allocator (#2392)
|
2025-07-22 08:24:01 -07:00 |