mlx/mlx/backend/metal/kernels
Awni Hannun 2419edd5b2
Faster indexing math in a few kernels (#1589)
* wip: faster compiled kernels

* faster general unary with uint specialization

* index type in compiled, unary, binary, ternary, copy

* fix jit

* jit fix

* specialize gather + scatter

* nit in docs
2024-11-18 19:52:00 -08:00
..
fft Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
jit Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
metal_3_0 Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
metal_3_1 Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
reduction Reductions update (#1351) 2024-11-04 22:25:16 -08:00
steel Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
arange.h More jitting (#1132) 2024-05-23 16:23:44 -07:00
arange.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
arg_reduce.metal More fixes for arrays with large sizes (#1405) 2024-09-17 12:46:31 -07:00
atomic.h Refactor reductions and fix scatter atomics for large sizes (#1300) 2024-08-22 16:03:31 -07:00
bf16_math.h Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
binary_ops.h Fix complex power on Metal (#1460) 2024-10-06 19:58:30 -07:00
binary_two.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
binary_two.metal Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
binary.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
binary.metal Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
CMakeLists.txt Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
complex.h Refactor reductions and fix scatter atomics for large sizes (#1300) 2024-08-22 16:03:31 -07:00
conv.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
copy.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
copy.metal Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
defines.h Refactor reductions and fix scatter atomics for large sizes (#1300) 2024-08-22 16:03:31 -07:00
erf.h JIT compile option for binary minimization (#1091) 2024-05-22 12:57:13 -07:00
expm1f.h Fix overflow / underflow handling for expm1f (#1278) 2024-07-23 07:29:06 -07:00
fft.h Feature complete Metal FFT (#1102) 2024-06-06 12:57:25 -07:00
fft.metal Add Quantized Ops to the JIT (#1204) 2024-06-12 09:47:12 -07:00
gather.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
gemv_masked.h Add gemv masked to JIT plus some fixes (#1310) 2024-08-07 13:38:07 -07:00
gemv_masked.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
gemv.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
hadamard.h Fix bfloat16 Hadamard (#1283) 2024-07-23 14:54:43 -07:00
indexing.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
layer_norm.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
quantized.h OOB QMV fix (#1579) 2024-11-08 17:59:45 -08:00
quantized.metal Add split_k qvm for long context (#1564) 2024-11-05 11:25:19 -08:00
random.metal Faster bits and bernoulli (#1535) 2024-10-28 11:11:00 -07:00
reduce_utils.h More jitting (#1132) 2024-05-23 16:23:44 -07:00
reduce.h Fix JIT reductions (#1373) 2024-08-28 16:39:11 -07:00
reduce.metal Reductions update (#1351) 2024-11-04 22:25:16 -08:00
rms_norm.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
rope.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
scaled_dot_product_attention_params.h Metal shaders for memory efficient self attention on large sequences (#964) 2024-06-03 09:16:19 -07:00
scaled_dot_product_attention.metal 2-Pass Sdpa Inference Kernel (#1597) 2024-11-18 17:31:53 -08:00
scan.h Working 64-bit scans (#1506) 2024-10-24 11:05:46 -07:00
scan.metal Working 64-bit scans (#1506) 2024-10-24 11:05:46 -07:00
scatter.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
sdpa_vector.h 2-Pass Sdpa Inference Kernel (#1597) 2024-11-18 17:31:53 -08:00
softmax.h consistently handle all -inf in softmax (#1470) 2024-10-08 09:54:02 -07:00
softmax.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
sort.h More fixes for arrays with large sizes (#1405) 2024-09-17 12:46:31 -07:00
sort.metal Dispatch bf16 at run time when using the JIT (#1584) 2024-11-15 16:54:36 -08:00
ternary_ops.h JIT compile option for binary minimization (#1091) 2024-05-22 12:57:13 -07:00
ternary.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
ternary.metal Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
unary_ops.h Real and Imag (#1490) 2024-10-15 16:23:15 -07:00
unary.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
unary.metal Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00
utils.h Faster indexing math in a few kernels (#1589) 2024-11-18 19:52:00 -08:00