mirror of
https://github.com/ml-explore/mlx.git
synced 2025-06-24 17:31:16 +08:00
![]() * Rough INIT * [WIP]: Loading and Matmuls added * [WIP]: Reductions and min working aligned kernel at headdim = 64 * [WIP] Added headdim 80 for testing * [WIP] Update dispatch params for testing * [WIP] Add support for unaligned seq lengths - still looks messy * Update sdpa_benchmarks * Update sdpa_benchmarks * Update sdpa_benchmarks * Enable gqa support * Update benchmark and switch off 128 headdim * Update headdim 128 tuning * Remove older fast attention code. Write out O strided * Disable hd=128 until further optimizations * Enable bf16 * Fix data size bug * Enable attn build outside of jit |
||
---|---|---|
.. | ||
blas | ||
comparative | ||
batch_matmul_bench.py | ||
compile_bench.py | ||
conv1d_bench.py | ||
conv2d_bench_cpu.py | ||
conv2d_train_bench_cpu.py | ||
conv2d_transpose_bench_cpu.py | ||
conv3d_bench_cpu.py | ||
conv3d_train_bench_cpu.py | ||
conv3d_transpose_bench_cpu.py | ||
conv_bench.py | ||
conv_transpose_bench.py | ||
distributed_bench.py | ||
einsum_bench.py | ||
fft_bench.py | ||
gather_bench.py | ||
hadamard_bench.py | ||
layer_norm_bench.py | ||
rms_norm_bench.py | ||
rope_bench.py | ||
scatter_bench.py | ||
sdpa_bench.py | ||
sdpa_vector_bench.py | ||
single_ops.py | ||
time_utils.py |