mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-23 06:58:10 +08:00

* use nn.RMSNorm, use sdpa, cleanup * bump mlx versions * minor update * use fast layer norm * version bump * update requirement for whisper * update requirement for gguf
LLMs in MLX with GGUF
An example generating text using GGUF format models in MLX.1
Note
MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly:
Q4_0
,Q4_1
, andQ8_0
. Unsupported quantizations will be cast tofloat16
.
Setup
Install the dependencies:
pip install -r requirements.txt
Run
Run with:
python generate.py \
--repo <hugging_face_repo> \
--gguf <file.gguf> \
--prompt "Write a quicksort in Python"
For example, to generate text with Mistral 7B use:
python generate.py \
--repo TheBloke/Mistral-7B-v0.1-GGUF \
--gguf mistral-7b-v0.1.Q8_0.gguf \
--prompt "Write a quicksort in Python"
Run python generate.py --help
for more options.
Models that have been tested and work include:
-
TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
mistral-7b-v0.1.Q8_0.gguf
mistral-7b-v0.1.Q4_0.gguf
-
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
tinyllama-1.1b-chat-v1.0.Q8_0.gguf
tinyllama-1.1b-chat-v1.0.Q4_0.gguf
-
For more information on GGUF see the documentation. ↩︎