mirror of
				https://github.com/ml-explore/mlx.git
				synced 2025-10-25 12:48:14 +08:00 
			
		
		
		
	 7a34e46677
			
		
	
	7a34e46677
	
	
	
		
			
			* allow quantize with group sizes of 32 * missing cpu dispatch * remove print * Fix qvm for group_size 32 --------- Co-authored-by: Angelos Katharopoulos <a_katharopoulos@apple.com>
Microbenchmarks comparing MLX to PyTorch
Implement the same microbenchmarks in MLX and PyTorch to compare and make a list of the biggest possible performance improvements and/or regressions.
Run with python bench_mlx.py sum_axis --size 8x1024x128 --axis 2 --cpu for
instance to measure the times it takes to sum across the 3rd axis of the above
tensor on the cpu.
compare.py runs several benchmarks and compares the speed-up or lack thereof
in comparison to PyTorch.
Each bench script can be run with --print-pid to print the PID and wait for a
key in order to ease attaching a debugger.