mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-06-24 01:17:28 +08:00
![]() * Made mypy compatible * reformatted * Added more fixes * Added fixes to speculative-decoding * Fixes * fix circle * revert some stuff --------- Co-authored-by: Awni Hannun <awni@apple.com> |
||
---|---|---|
.. | ||
generate.py | ||
models.py | ||
README.md | ||
requirements.txt | ||
utils.py |
LLMs in MLX with GGUF
An example generating text using GGUF format models in MLX.1
Note
MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly:
Q4_0
,Q4_1
, andQ8_0
. Unsupported quantizations will be cast tofloat16
.
Setup
Install the dependencies:
pip install -r requirements.txt
Run
Run with:
python generate.py \
--repo <hugging_face_repo> \
--gguf <file.gguf> \
--prompt "Write a quicksort in Python"
For example, to generate text with Mistral 7B use:
python generate.py \
--repo TheBloke/Mistral-7B-v0.1-GGUF \
--gguf mistral-7b-v0.1.Q8_0.gguf \
--prompt "Write a quicksort in Python"
Run python generate.py --help
for more options.
Models that have been tested and work include:
-
TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
mistral-7b-v0.1.Q8_0.gguf
mistral-7b-v0.1.Q4_0.gguf
-
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
tinyllama-1.1b-chat-v1.0.Q8_0.gguf
tinyllama-1.1b-chat-v1.0.Q4_0.gguf
-
Jaward/phi-3-mini-4k-instruct.Q4_0.gguf, for 4 bits quantized phi-3-mini-4k-instruct use:
phi-3-mini-4k-instruct.Q4_0.gguf
-
For more information on GGUF see the documentation. ↩︎