.. |
copy
|
[CUDA] Fix back-end bugs and enable corresponding tests (#2296)
|
2025-06-16 08:45:40 -07:00 |
device
|
more bug fixes
|
2025-06-16 09:35:58 -07:00 |
iterators
|
CUDA backend: argreduce (#2270)
|
2025-06-11 13:26:17 -07:00 |
reduce
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
allocator.cpp
|
Avoid invoking allocator::malloc when creating CUDA event (#2232)
|
2025-06-03 16:48:40 -07:00 |
allocator.h
|
Avoid invoking allocator::malloc when creating CUDA event (#2232)
|
2025-06-03 16:48:40 -07:00 |
arg_reduce.cu
|
Fix cuda arg reduce (#2291)
|
2025-06-14 17:54:00 -07:00 |
bin2h.cmake
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
binary.cu
|
more bug fixes
|
2025-06-16 09:35:58 -07:00 |
CMakeLists.txt
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
compiled.cpp
|
more bug fixes
|
2025-06-16 09:35:58 -07:00 |
copy.cu
|
[CUDA] Fix back-end bugs and enable corresponding tests (#2296)
|
2025-06-16 08:45:40 -07:00 |
cuda.cpp
|
start cuda circle config (#2256)
|
2025-06-10 21:19:47 -07:00 |
cuda.h
|
start cuda circle config (#2256)
|
2025-06-10 21:19:47 -07:00 |
device.cpp
|
CUDA backend: matmul (#2241)
|
2025-06-06 12:24:04 -07:00 |
device.h
|
CUDA backend: matmul (#2241)
|
2025-06-06 12:24:04 -07:00 |
eval.cpp
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |
event.cu
|
Avoid atomic updates across CPU/GPU in CUDA event (#2231)
|
2025-06-03 16:49:06 -07:00 |
event.h
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |
fence.cpp
|
Avoid atomic updates across CPU/GPU in CUDA event (#2231)
|
2025-06-03 16:49:06 -07:00 |
indexing.cpp
|
CUDA backend: indexing ops (#2277)
|
2025-06-12 21:44:19 -07:00 |
jit_module.cpp
|
fix cuda jit (#2287)
|
2025-06-13 19:21:46 -07:00 |
jit_module.h
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
kernel_utils.cu
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
kernel_utils.cuh
|
[CUDA] Fix back-end bugs and enable corresponding tests (#2296)
|
2025-06-16 08:45:40 -07:00 |
layer_norm.cu
|
[CUDA] RMSNorm and VJP (#2280)
|
2025-06-12 17:09:49 -07:00 |
logsumexp.cu
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
matmul.cpp
|
Fix cuda arg reduce (#2291)
|
2025-06-14 17:54:00 -07:00 |
no_cuda.cpp
|
start cuda circle config (#2256)
|
2025-06-10 21:19:47 -07:00 |
primitives.cu
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
random.cu
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
reduce.cu
|
CUDA backend: reduce (#2269)
|
2025-06-11 11:22:25 -07:00 |
rms_norm.cu
|
[CUDA] RMSNorm and VJP (#2280)
|
2025-06-12 17:09:49 -07:00 |
rope.cu
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
slicing.cpp
|
rebase + nit (#2260)
|
2025-06-10 10:51:51 -07:00 |
softmax.cu
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
sort.cu
|
CUDA backend: sort (#2262)
|
2025-06-10 08:59:47 -07:00 |
ternary.cu
|
more bug fixes
|
2025-06-16 09:35:58 -07:00 |
unary.cu
|
more bug fixes
|
2025-06-16 09:35:58 -07:00 |
utils.cpp
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
utils.h
|
CUDA backend: compile (#2276)
|
2025-06-12 17:08:39 -07:00 |
worker.cpp
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |
worker.h
|
CUDA backend: backbone (#2075)
|
2025-05-06 21:26:46 -07:00 |