..
kernels
Fix mask broadcasting bug and add relevant test ( #1003 )
2024-04-17 17:33:48 -07:00
mps
copyright + ack
2023-11-30 11:12:53 -08:00
allocator.cpp
Improve profiling with gpu tracing ( #969 )
2024-04-07 21:47:43 -07:00
allocator.h
Some fixes in cache / thread safety ( #777 )
2024-03-05 13:30:50 -08:00
CMakeLists.txt
Shared events for synchronization + async eval ( #998 )
2024-04-17 06:16:02 -07:00
compiled_preamble.h
Kernel generation ( #614 )
2024-02-07 13:15:59 -08:00
compiled.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
conv.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
copy.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
copy.h
Add a SliceUpdate op and primitive ( #850 )
2024-03-20 10:39:25 -07:00
device.cpp
Shared events for synchronization + async eval ( #998 )
2024-04-17 06:16:02 -07:00
device.h
No copy command encoder ( #986 )
2024-04-11 21:15:36 -07:00
event.cpp
Shared events for synchronization + async eval ( #998 )
2024-04-17 06:16:02 -07:00
fft.cpp
Metal FFT for powers of 2 up to 2048 ( #915 )
2024-04-11 21:40:06 -07:00
indexing.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
make_compiled_preamble.sh
quote file name ( #670 )
2024-02-11 10:33:30 -08:00
matmul.cpp
Fix mask broadcasting bug and add relevant test ( #1003 )
2024-04-17 17:33:48 -07:00
matmul.h
No copy gems ( #801 )
2024-03-12 13:13:41 -07:00
metal_impl.h
Add synchronize function ( #1006 )
2024-04-22 08:25:46 -07:00
metal.cpp
Add synchronize function ( #1006 )
2024-04-22 08:25:46 -07:00
metal.h
Add synchronize function ( #1006 )
2024-04-22 08:25:46 -07:00
normalization.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
primitives.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
quantized.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
reduce.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
reduce.h
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
rope.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
scaled_dot_product_attention.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
scan.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
softmax.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
sort.cpp
Explicit barriers with concurrent dispatch ( #977 )
2024-04-10 21:45:31 -07:00
utils.h
Metal FFT for powers of 2 up to 2048 ( #915 )
2024-04-11 21:40:06 -07:00