.. |
jit
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
kernels
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
mps
|
copyright + ack
|
2023-11-30 11:12:53 -08:00 |
allocator.cpp
|
Reset peak memory (#1074)
|
2024-05-03 17:12:51 -07:00 |
allocator.h
|
Reset peak memory (#1074)
|
2024-05-03 17:12:51 -07:00 |
binary.cpp
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
binary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
CMakeLists.txt
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
compiled.cpp
|
Fix a couple bugs (#1161)
|
2024-05-28 15:18:18 -07:00 |
conv.cpp
|
Option to JIT steel gemm / conv (#1139)
|
2024-05-23 18:07:34 -07:00 |
copy.cpp
|
Fix Metal API validation for empty concat (#1183)
|
2024-06-04 13:17:08 -07:00 |
copy.h
|
Add a SliceUpdate op and primitive (#850)
|
2024-03-20 10:39:25 -07:00 |
device.cpp
|
JIT compile option for binary minimization (#1091)
|
2024-05-22 12:57:13 -07:00 |
device.h
|
Fix offset bug for device buffers (#1151)
|
2024-05-22 15:50:05 -07:00 |
event.cpp
|
Shared events for synchronization + async eval (#998)
|
2024-04-17 06:16:02 -07:00 |
fft.cpp
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
indexing.cpp
|
More jitting (#1132)
|
2024-05-23 16:23:44 -07:00 |
jit_kernels.cpp
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
kernels.h
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
make_compiled_preamble.sh
|
Option to JIT steel gemm / conv (#1139)
|
2024-05-23 18:07:34 -07:00 |
matmul.cpp
|
Fix matvec vector stride bug (#1168)
|
2024-05-29 12:18:28 -07:00 |
matmul.h
|
Add groups to Conv1d (#948)
|
2024-04-27 06:24:57 -07:00 |
metal_impl.h
|
Add synchronize function (#1006)
|
2024-04-22 08:25:46 -07:00 |
metal.cpp
|
Split encoders in non-concurrent context with a max ops per encoder (#1085)
|
2024-05-09 16:21:02 -07:00 |
metal.h
|
Reset peak memory (#1074)
|
2024-05-03 17:12:51 -07:00 |
nojit_kernels.cpp
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |
normalization.cpp
|
Split encoders in non-concurrent context with a max ops per encoder (#1085)
|
2024-05-09 16:21:02 -07:00 |
primitives.cpp
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
quantized.cpp
|
Rename block sparse (#1149)
|
2024-05-22 07:48:34 -07:00 |
reduce.cpp
|
Fix a couple bugs (#1161)
|
2024-05-28 15:18:18 -07:00 |
reduce.h
|
Explicit barriers with concurrent dispatch (#977)
|
2024-04-10 21:45:31 -07:00 |
rope.cpp
|
Split encoders in non-concurrent context with a max ops per encoder (#1085)
|
2024-05-09 16:21:02 -07:00 |
scaled_dot_product_attention.cpp
|
Metal shaders for memory efficient self attention on large sequences (#964)
|
2024-06-03 09:16:19 -07:00 |
scan.cpp
|
fix jit scan when output doesn't have primitive (#1190)
|
2024-06-06 07:24:58 -07:00 |
slicing.cpp
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
slicing.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
softmax.cpp
|
More jitting (#1132)
|
2024-05-23 16:23:44 -07:00 |
sort.cpp
|
Fix multi-block sort stride management (#1169)
|
2024-05-31 11:10:54 -07:00 |
ternary.cpp
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
ternary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
unary.cpp
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
unary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
utils.h
|
Feature complete Metal FFT (#1102)
|
2024-06-06 12:57:25 -07:00 |