.. |
binary.h
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
broadcasting.cpp
|
Gather mm new kernel and small refactoring (#2040)
|
2025-04-14 16:37:36 -07:00 |
broadcasting.h
|
Gather mm new kernel and small refactoring (#2040)
|
2025-04-14 16:37:36 -07:00 |
buffer_cache.h
|
Add memory cache to CUDA backend (#2221)
|
2025-05-30 12:12:54 -07:00 |
CMakeLists.txt
|
Gather mm new kernel and small refactoring (#2040)
|
2025-04-14 16:37:36 -07:00 |
common.cpp
|
Gather mm new kernel and small refactoring (#2040)
|
2025-04-14 16:37:36 -07:00 |
compiled.cpp
|
Share more common code in Compiled (#2240)
|
2025-06-03 16:48:50 -07:00 |
compiled.h
|
Share more common code in Compiled (#2240)
|
2025-06-03 16:48:50 -07:00 |
copy.h
|
CUDA backend: unary ops (#2158)
|
2025-06-09 06:45:08 -07:00 |
hadamard.h
|
GPU Hadamard for large N (#1879)
|
2025-05-01 17:19:17 -07:00 |
load.cpp
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
matmul.h
|
CUDA backend: matmul (#2241)
|
2025-06-06 12:24:04 -07:00 |
reduce.cpp
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
reduce.h
|
Refactor common into cpu specific and truly common (#1817)
|
2025-02-03 15:58:02 -08:00 |
slicing.cpp
|
redesign for faster cpu/gpu synch (#1869)
|
2025-03-06 19:23:38 -08:00 |
slicing.h
|
Fix a couple of slicing bugs (#1827)
|
2025-02-05 19:50:08 -08:00 |
ternary.h
|
fix malloc or wait deadlock (#1976)
|
2025-03-20 16:48:43 -07:00 |
unary.h
|
CUDA backend: unary ops (#2158)
|
2025-06-09 06:45:08 -07:00 |
utils.cpp
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |
utils.h
|
RoPE for CUDA (#2293)
|
2025-06-15 06:08:07 -07:00 |