.. |
jit
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
kernels
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
allocator.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
allocator.h
|
Wired (#1510)
|
2024-10-25 09:35:33 -07:00 |
binary.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
binary.h
|
Fixes for large arrays with a few ops (#1299)
|
2024-07-30 17:18:39 -07:00 |
CMakeLists.txt
|
Use osx deployment target to pick Metal version (#1595)
|
2024-11-18 19:16:49 -08:00 |
compiled.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
conv.cpp
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
copy.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
copy.h
|
Fix copying scalars by adding fill_gpu (#1402)
|
2024-09-09 15:54:08 -07:00 |
custom_kernel.cpp
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
device.cpp
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
device.h
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
distributed.cpp
|
Adds send/recv ops in distributed (#1366)
|
2024-08-26 23:01:37 -07:00 |
event.cpp
|
Fix array is_available race cases (#1468)
|
2024-10-07 19:13:50 -07:00 |
fft.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
hadamard.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
indexing.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
jit_kernels.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
kernels.h
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
make_compiled_preamble.sh
|
Dispatch bf16 at run time when using the JIT (#1584)
|
2024-11-15 16:54:36 -08:00 |
matmul.cpp
|
fix dispatch threads for a few kernels (#1594)
|
2024-11-18 08:35:25 -08:00 |
matmul.h
|
Wired (#1510)
|
2024-10-25 09:35:33 -07:00 |
metal_impl.h
|
Add synchronize function (#1006)
|
2024-04-22 08:25:46 -07:00 |
metal.cpp
|
Bfs width limit (#1568)
|
2024-11-08 15:00:46 -08:00 |
metal.h
|
Wired (#1510)
|
2024-10-25 09:35:33 -07:00 |
nojit_kernels.cpp
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
normalization.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
primitives.cpp
|
Fix view scalar bug segfault (#1603)
|
2024-11-19 10:54:05 -08:00 |
quantized.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
reduce.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
reduce.h
|
Reductions update (#1351)
|
2024-11-04 22:25:16 -08:00 |
resident.cpp
|
Skip using Residency sets in VMs (#1537)
|
2024-10-29 19:37:23 -07:00 |
resident.h
|
Wired (#1510)
|
2024-10-25 09:35:33 -07:00 |
rope.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
scaled_dot_product_attention.cpp
|
2-Pass Sdpa Inference Kernel (#1597)
|
2024-11-18 17:31:53 -08:00 |
scan.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
slicing.cpp
|
Fix copying scalars by adding fill_gpu (#1402)
|
2024-09-09 15:54:08 -07:00 |
slicing.h
|
Fix slice data size (#1394)
|
2024-09-04 19:10:43 -07:00 |
softmax.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
sort.cpp
|
Fully wrap the command encoder (#1572)
|
2024-11-08 11:50:21 -08:00 |
ternary.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
ternary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
unary.cpp
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |
unary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
utils.cpp
|
Fix thread group for large arrays (#1543)
|
2024-10-30 16:25:12 -07:00 |
utils.h
|
Faster indexing math in a few kernels (#1589)
|
2024-11-18 19:52:00 -08:00 |