.. |
jit
|
More fixes for arrays with large sizes (#1405)
|
2024-09-17 12:46:31 -07:00 |
kernels
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
allocator.cpp
|
Allow querying the allocator for the buffer size (#1404)
|
2024-09-11 21:02:16 -07:00 |
allocator.h
|
Allow querying the allocator for the buffer size (#1404)
|
2024-09-11 21:02:16 -07:00 |
binary.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
binary.h
|
Fixes for large arrays with a few ops (#1299)
|
2024-07-30 17:18:39 -07:00 |
CMakeLists.txt
|
Chore: add pre-commit hook for cmake (#1362)
|
2024-09-16 12:53:01 -07:00 |
compiled.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
conv.cpp
|
Conv grad with groups + bugfix (#1449)
|
2024-10-06 07:08:53 -07:00 |
copy.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
copy.h
|
Fix copying scalars by adding fill_gpu (#1402)
|
2024-09-09 15:54:08 -07:00 |
custom_kernel.cpp
|
Make the GPU device more thread safe (#1478)
|
2024-10-12 17:49:15 -07:00 |
device.cpp
|
Make the GPU device more thread safe (#1478)
|
2024-10-12 17:49:15 -07:00 |
device.h
|
Make the GPU device more thread safe (#1478)
|
2024-10-12 17:49:15 -07:00 |
distributed.cpp
|
Adds send/recv ops in distributed (#1366)
|
2024-08-26 23:01:37 -07:00 |
event.cpp
|
Fix array is_available race cases (#1468)
|
2024-10-07 19:13:50 -07:00 |
fft.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
hadamard.cpp
|
Make the GPU device more thread safe (#1478)
|
2024-10-12 17:49:15 -07:00 |
indexing.cpp
|
Make the GPU device more thread safe (#1478)
|
2024-10-12 17:49:15 -07:00 |
jit_kernels.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
kernels.h
|
fix jit reduce (#1395)
|
2024-09-04 14:03:10 -07:00 |
make_compiled_preamble.sh
|
fix compiling with space in paths (#1332)
|
2024-08-15 16:39:24 -07:00 |
matmul.cpp
|
Conv grad with groups + bugfix (#1449)
|
2024-10-06 07:08:53 -07:00 |
matmul.h
|
Conv grad with groups + bugfix (#1449)
|
2024-10-06 07:08:53 -07:00 |
metal_impl.h
|
Add synchronize function (#1006)
|
2024-04-22 08:25:46 -07:00 |
metal.cpp
|
Fix array is_available race cases (#1468)
|
2024-10-07 19:13:50 -07:00 |
metal.h
|
Reset peak memory (#1074)
|
2024-05-03 17:12:51 -07:00 |
nojit_kernels.cpp
|
fix jit reduce (#1395)
|
2024-09-04 14:03:10 -07:00 |
normalization.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
primitives.cpp
|
Avoid io timeout for large arrays (#1442)
|
2024-09-27 13:32:14 -07:00 |
quantized.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
reduce.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
reduce.h
|
Further reduction tuning (#1349)
|
2024-08-23 10:35:25 -07:00 |
rope.cpp
|
Xcode 160 (#1384)
|
2024-09-10 15:15:17 -07:00 |
scaled_dot_product_attention.cpp
|
Metal shaders for memory efficient self attention on large sequences (#964)
|
2024-06-03 09:16:19 -07:00 |
scan.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
slicing.cpp
|
Fix copying scalars by adding fill_gpu (#1402)
|
2024-09-09 15:54:08 -07:00 |
slicing.h
|
Fix slice data size (#1394)
|
2024-09-04 19:10:43 -07:00 |
softmax.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
sort.cpp
|
Fix normalization check_input (#1452)
|
2024-10-03 13:26:56 -07:00 |
ternary.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
ternary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
unary.cpp
|
Faster metal compiled kernels + some fixes (#1486)
|
2024-10-14 12:45:38 -07:00 |
unary.h
|
Add some internal GPU apis (#1177)
|
2024-06-04 09:24:26 -07:00 |
utils.cpp
|
Add gemv masked to JIT plus some fixes (#1310)
|
2024-08-07 13:38:07 -07:00 |
utils.h
|
Add gemv masked to JIT plus some fixes (#1310)
|
2024-08-07 13:38:07 -07:00 |