mlx/mlx/backend/metal
Alex Barron 27d70c7d9d
Feature complete Metal FFT (#1102)
* feature complete metal fft

* fix contiguity bug

* jit fft

* simplify rader/bluestein constant computation

* remove kernel/utils.h dep

* remove bf16.h dep

* format

---------

Co-authored-by: Alex Barron <abarron22@apple.com>
2024-06-06 12:57:25 -07:00
..
jit Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
kernels Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
mps copyright + ack 2023-11-30 11:12:53 -08:00
allocator.cpp Reset peak memory (#1074) 2024-05-03 17:12:51 -07:00
allocator.h Reset peak memory (#1074) 2024-05-03 17:12:51 -07:00
binary.cpp Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
binary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
CMakeLists.txt Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
compiled.cpp Fix a couple bugs (#1161) 2024-05-28 15:18:18 -07:00
conv.cpp Option to JIT steel gemm / conv (#1139) 2024-05-23 18:07:34 -07:00
copy.cpp Fix Metal API validation for empty concat (#1183) 2024-06-04 13:17:08 -07:00
copy.h Add a SliceUpdate op and primitive (#850) 2024-03-20 10:39:25 -07:00
device.cpp JIT compile option for binary minimization (#1091) 2024-05-22 12:57:13 -07:00
device.h Fix offset bug for device buffers (#1151) 2024-05-22 15:50:05 -07:00
event.cpp Shared events for synchronization + async eval (#998) 2024-04-17 06:16:02 -07:00
fft.cpp Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
indexing.cpp More jitting (#1132) 2024-05-23 16:23:44 -07:00
jit_kernels.cpp Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
kernels.h Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
make_compiled_preamble.sh Option to JIT steel gemm / conv (#1139) 2024-05-23 18:07:34 -07:00
matmul.cpp Fix matvec vector stride bug (#1168) 2024-05-29 12:18:28 -07:00
matmul.h Add groups to Conv1d (#948) 2024-04-27 06:24:57 -07:00
metal_impl.h Add synchronize function (#1006) 2024-04-22 08:25:46 -07:00
metal.cpp Split encoders in non-concurrent context with a max ops per encoder (#1085) 2024-05-09 16:21:02 -07:00
metal.h Reset peak memory (#1074) 2024-05-03 17:12:51 -07:00
nojit_kernels.cpp Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
normalization.cpp Split encoders in non-concurrent context with a max ops per encoder (#1085) 2024-05-09 16:21:02 -07:00
primitives.cpp Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
quantized.cpp Rename block sparse (#1149) 2024-05-22 07:48:34 -07:00
reduce.cpp Fix a couple bugs (#1161) 2024-05-28 15:18:18 -07:00
reduce.h Explicit barriers with concurrent dispatch (#977) 2024-04-10 21:45:31 -07:00
rope.cpp Split encoders in non-concurrent context with a max ops per encoder (#1085) 2024-05-09 16:21:02 -07:00
scaled_dot_product_attention.cpp Metal shaders for memory efficient self attention on large sequences (#964) 2024-06-03 09:16:19 -07:00
scan.cpp fix jit scan when output doesn't have primitive (#1190) 2024-06-06 07:24:58 -07:00
slicing.cpp Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
slicing.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
softmax.cpp More jitting (#1132) 2024-05-23 16:23:44 -07:00
sort.cpp Fix multi-block sort stride management (#1169) 2024-05-31 11:10:54 -07:00
ternary.cpp Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
ternary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
unary.cpp Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
unary.h Add some internal GPU apis (#1177) 2024-06-04 09:24:26 -07:00
utils.h Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00