mirror of
https://github.com/ml-explore/mlx.git
synced 2025-06-25 01:41:17 +08:00
![]() * Organize and collect metal subroutine templates and elements in `metal/kernels/steel/` * Update gemm elements for better performance * Add split-K specialization for gemm * Add `addmm` primitive, op and bindings for fused matmul and bias addition * Update tests and benchmarks as needed |
||
---|---|---|
.. | ||
bench_mlx.py | ||
bench_torch.py | ||
compare.py | ||
README.md |
Microbenchmarks comparing MLX to PyTorch
Implement the same microbenchmarks in MLX and PyTorch to compare and make a list of the biggest possible performance improvements and/or regressions.
Run with python bench_mlx.py sum_axis --size 8x1024x128 --axis 2 --cpu
for
instance to measure the times it takes to sum across the 3rd axis of the above
tensor on the cpu.
compare.py
runs several benchmarks and compares the speed-up or lack thereof
in comparison to PyTorch.
Each bench script can be run with --print-pid
to print the PID and wait for a
key in order to ease attaching a debugger.