mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-06-24 09:21:18 +08:00
![]() - bert/model.py:10: tree_unflatten - bert/model.py:2: dataclass - bert/model.py:8: numpy - cifar/resnet.py:6: Any - clip/model.py:15: tree_flatten - clip/model.py:9: Union - gcn/main.py:8: download_cora - gcn/main.py:9: cross_entropy - llms/gguf_llm/models.py:12: tree_flatten, tree_unflatten - llms/gguf_llm/models.py:9: numpy - llms/mixtral/mixtral.py:12: tree_map - llms/mlx_lm/models/dbrx.py:2: Dict, Union - llms/mlx_lm/tuner/trainer.py:5: partial - llms/speculative_decoding/decoder.py:1: dataclass, field - llms/speculative_decoding/decoder.py:2: Optional - llms/speculative_decoding/decoder.py:5: mlx.nn - llms/speculative_decoding/decoder.py:6: numpy - llms/speculative_decoding/main.py:2: glob - llms/speculative_decoding/main.py:3: json - llms/speculative_decoding/main.py:5: Path - llms/speculative_decoding/main.py:8: mlx.nn - llms/speculative_decoding/model.py:6: tree_unflatten - llms/speculative_decoding/model.py:7: AutoTokenizer - llms/tests/test_lora.py:13: yaml_loader - lora/lora.py:14: tree_unflatten - lora/models.py:11: numpy - lora/models.py:3: glob - speechcommands/kwt.py:1: Any - speechcommands/main.py:7: mlx.data - stable_diffusion/stable_diffusion/model_io.py:4: partial - whisper/benchmark.py:5: sys - whisper/test.py:5: subprocess - whisper/whisper/audio.py:6: Optional - whisper/whisper/decoding.py:8: mlx.nn |
||
---|---|---|
.. | ||
generate.py | ||
models.py | ||
README.md | ||
requirements.txt | ||
utils.py |
LLMs in MLX with GGUF
An example generating text using GGUF format models in MLX.1
Note
MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly:
Q4_0
,Q4_1
, andQ8_0
. Unsupported quantizations will be cast tofloat16
.
Setup
Install the dependencies:
pip install -r requirements.txt
Run
Run with:
python generate.py \
--repo <hugging_face_repo> \
--gguf <file.gguf> \
--prompt "Write a quicksort in Python"
For example, to generate text with Mistral 7B use:
python generate.py \
--repo TheBloke/Mistral-7B-v0.1-GGUF \
--gguf mistral-7b-v0.1.Q8_0.gguf \
--prompt "Write a quicksort in Python"
Run python generate.py --help
for more options.
Models that have been tested and work include:
-
TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
mistral-7b-v0.1.Q8_0.gguf
mistral-7b-v0.1.Q4_0.gguf
-
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
tinyllama-1.1b-chat-v1.0.Q8_0.gguf
tinyllama-1.1b-chat-v1.0.Q4_0.gguf
-
For more information on GGUF see the documentation. ↩︎