Microbenchmarks comparing MLX to PyTorch ======================================== Implement the same microbenchmarks in MLX and PyTorch to compare and make a list of the biggest possible performance improvements and/or regressions. Run with `python bench_mlx.py sum_axis --size 8x1024x128 --axis 2 --cpu` for instance to measure the times it takes to sum across the 3rd axis of the above tensor on the cpu. `compare.py` runs several benchmarks and compares the speed-up or lack thereof in comparison to PyTorch. Each bench script can be run with `--print-pid` to print the PID and wait for a key in order to ease attaching a debugger.