This commit introduces a comprehensive memory estimation utility for MLX language models, supporting:
- Dynamic parameter calculation across diverse model architectures
- Handling of quantized and standard models
- Estimation of model weights, KV cache, and overhead memory
- Support for bounded and unbounded KV cache modes
- Flexible configuration via command-line arguments
The new tool provides detailed memory usage insights for different model configurations and generation scenarios.
* Make sure to import the correct "version" module when installing the
mlx_whisper package from local source code.
* Make sure to import the correct "version" module when installing the mlx_lm package from local source code
* fix
---------
Co-authored-by: Awni Hannun <awni@apple.com>