..
jit
Gather mm new kernel and small refactoring ( #2040 )
2025-04-14 16:37:36 -07:00
kernels
Fix fft for integer overflow ( #2161 )
2025-05-09 14:25:12 -07:00
allocator.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
allocator.h
wire cache ( #2006 )
2025-03-25 18:54:01 -07:00
binary.cpp
fix bw for elementwise ops ( #2151 )
2025-05-05 06:15:04 -07:00
binary.h
Fixes for large arrays with a few ops ( #1299 )
2024-07-30 17:18:39 -07:00
CMakeLists.txt
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
compiled.cpp
fix bw for elementwise ops ( #2151 )
2025-05-05 06:15:04 -07:00
conv.cpp
fix: conv_general differences between gpu, cpu ( #2070 )
2025-05-09 10:26:52 -07:00
copy.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
custom_kernel.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
device.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
device.h
Add set_threadgroup_memory_length to CommandEncoder
2025-05-13 21:45:30 -04:00
distributed.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
eval.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
event.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
fence.cpp
fix input coherent kernel launch ( #2153 )
2025-05-05 17:30:50 -07:00
fft.cpp
Fix fft for integer overflow ( #2161 )
2025-05-09 14:25:12 -07:00
hadamard.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
indexing.cpp
Add remove_index utility ( #2173 )
2025-05-13 17:09:56 -07:00
jit_kernels.cpp
Gather qmm batched kernel and refactoring of quantized ( #2078 )
2025-04-17 13:53:11 -07:00
kernels.h
Gather qmm batched kernel and refactoring of quantized ( #2078 )
2025-04-17 13:53:11 -07:00
logsumexp.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
make_compiled_preamble.sh
Dispatch bf16 at run time when using the JIT ( #1584 )
2024-11-15 16:54:36 -08:00
matmul.cpp
Close a couple edge case bugs: hadamard and addmm on empty inputs ( #2177 )
2025-05-12 10:48:57 -07:00
matmul.h
Use int64 stride everywhere ( #1671 )
2024-12-09 11:09:02 -08:00
metal.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
metal.h
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
no_metal.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
nojit_kernels.cpp
Gather qmm batched kernel and refactoring of quantized ( #2078 )
2025-04-17 13:53:11 -07:00
normalization.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
primitives.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
quantized.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
reduce.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
reduce.h
Reductions update ( #1351 )
2024-11-04 22:25:16 -08:00
resident.cpp
Generalize gpu backend ( #2138 )
2025-04-30 09:08:17 -07:00
resident.h
Wired ( #1510 )
2024-10-25 09:35:33 -07:00
rope.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
scaled_dot_product_attention.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
scan.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
slicing.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
softmax.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
sort.cpp
Move common gpu primitives to backend/gpu ( #2145 )
2025-05-05 13:45:29 -07:00
ternary.cpp
fix bw for elementwise ops ( #2151 )
2025-05-05 06:15:04 -07:00
ternary.h
Add some internal GPU apis ( #1177 )
2024-06-04 09:24:26 -07:00
unary.cpp
fix bw for elementwise ops ( #2151 )
2025-05-05 06:15:04 -07:00
unary.h
Add some internal GPU apis ( #1177 )
2024-06-04 09:24:26 -07:00
utils.cpp
Fp64 on the CPU ( #1843 )
2025-02-07 15:52:22 -08:00
utils.h
fix bw for elementwise ops ( #2151 )
2025-05-05 06:15:04 -07:00