mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-07-14 05:31:12 +08:00
![]() This commit introduces a comprehensive memory estimation utility for MLX language models, supporting: - Dynamic parameter calculation across diverse model architectures - Handling of quantized and standard models - Estimation of model weights, KV cache, and overhead memory - Support for bounded and unbounded KV cache modes - Flexible configuration via command-line arguments The new tool provides detailed memory usage insights for different model configurations and generation scenarios. |
||
---|---|---|
.. | ||
examples | ||
models | ||
tuner | ||
__init__.py | ||
_version.py | ||
cache_prompt.py | ||
chat.py | ||
convert.py | ||
estimate_memory.py | ||
evaluate.py | ||
fuse.py | ||
generate.py | ||
gguf.py | ||
LORA.md | ||
lora.py | ||
MANAGE.md | ||
manage.py | ||
MERGE.md | ||
merge.py | ||
py.typed | ||
README.md | ||
requirements.txt | ||
sample_utils.py | ||
SERVER.md | ||
server.py | ||
tokenizer_utils.py | ||
UPLOAD.md | ||
utils.py |
Generate Text with MLX and 🤗 Hugging Face
This an example of large language model text generation that can pull models from the Hugging Face Hub.
For more information on this example, see the README in the parent directory.
This package also supports fine tuning with LoRA or QLoRA. For more information see the LoRA documentation.