mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-06-24 09:21:18 +08:00

* support for phi-3 4bits quantized gguf weights * Added link to 4 bits quantized model * removed some prints * Added correct comment * Added correct comment * removed print Since last condition already prints warning for when quantization is None
1.5 KiB
1.5 KiB
LLMs in MLX with GGUF
An example generating text using GGUF format models in MLX.1
Note
MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly:
Q4_0
,Q4_1
, andQ8_0
. Unsupported quantizations will be cast tofloat16
.
Setup
Install the dependencies:
pip install -r requirements.txt
Run
Run with:
python generate.py \
--repo <hugging_face_repo> \
--gguf <file.gguf> \
--prompt "Write a quicksort in Python"
For example, to generate text with Mistral 7B use:
python generate.py \
--repo TheBloke/Mistral-7B-v0.1-GGUF \
--gguf mistral-7b-v0.1.Q8_0.gguf \
--prompt "Write a quicksort in Python"
Run python generate.py --help
for more options.
Models that have been tested and work include:
-
TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
mistral-7b-v0.1.Q8_0.gguf
mistral-7b-v0.1.Q4_0.gguf
-
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
tinyllama-1.1b-chat-v1.0.Q8_0.gguf
tinyllama-1.1b-chat-v1.0.Q4_0.gguf
-
Jaward/phi-3-mini-4k-instruct.Q4_0.gguf, for 4 bits quantized phi-3-mini-4k-instruct use:
phi-3-mini-4k-instruct.Q4_0.gguf
-
For more information on GGUF see the documentation. ↩︎