mirror of
https://github.com/ml-explore/mlx.git
synced 2025-06-25 09:51:17 +08:00
![]() * Metal shaders for efficient self attention on large sequences Updated fast attention: GEMM-ified with Steel primitives Uses flash attention 1 for scale correction * more compiler silencing * Address rebase issues * Templatize kernel instantiation, revise cpu bindings * Safer writes to output * Permit batch size > 1 * Numerical fixes for sdpa self attention * Re-enable test, remove unused variable * add benchmarking script * Disable sdpa prior to perf tuning, and simplify tests for per-patch CI |
||
---|---|---|
.. | ||
blas | ||
comparative | ||
batch_matmul_bench.py | ||
compile_bench.py | ||
conv1d_bench.py | ||
conv_bench.py | ||
fft_bench.py | ||
gather_bench.py | ||
layer_norm_bench.py | ||
rms_norm_bench.py | ||
rope_bench.py | ||
scatter_bench.py | ||
sdpa_bench.py | ||
single_ops.py | ||
time_utils.py |