mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-01 04:14:38 +08:00
Example reading directly from gguf file (#222)
* Draft of tiny llama from gguf * Transpose all * No transposition with new layout * Read config from gguf * Create tokenizer from gguf * move gguf and update to be similar to hf_llm * change model to HF style + updates to REAMDE * nits in REAMDE * nit readme * only use mlx for metadata * fix eos/bos tokenizer * fix tokenization * quantization runs * 8-bit works * tokenizer fix * bump mlx version --------- Co-authored-by: Juarez Bochi <juarez.bochi@grammarly.com> Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
52
llms/gguf_llm/README.md
Normal file
52
llms/gguf_llm/README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# LLMs in MLX with GGUF
|
||||
|
||||
An example generating text using GGUF format models in MLX.[^1]
|
||||
|
||||
> [!NOTE]
|
||||
> MLX is able to read most quantization formats from GGUF directly. However,
|
||||
> only a few quantizations are supported directly: `Q4_0`, `Q4_1`, and `Q8_0`.
|
||||
> Unsupported quantizations will be cast to `float16`.
|
||||
|
||||
## Setup
|
||||
|
||||
Install the dependencies:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Run
|
||||
|
||||
Run with:
|
||||
|
||||
```bash
|
||||
python generate.py \
|
||||
--repo <hugging_face_repo> \
|
||||
--gguf <file.gguf> \
|
||||
--prompt "Write a quicksort in Python"
|
||||
```
|
||||
|
||||
For example, to generate text with Mistral 7B use:
|
||||
|
||||
```bash
|
||||
python generate.py \
|
||||
--repo TheBloke/Mistral-7B-v0.1-GGUF \
|
||||
--gguf mistral-7b-v0.1.Q8_0.gguf \
|
||||
--prompt "Write a quicksort in Python"
|
||||
```
|
||||
|
||||
Run `python generate.py --help` for more options.
|
||||
|
||||
Models that have been tested and work include:
|
||||
|
||||
- [TheBloke/Mistral-7B-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF),
|
||||
for quantized models use:
|
||||
- `mistral-7b-v0.1.Q8_0.gguf`
|
||||
- `mistral-7b-v0.1.Q4_0.gguf`
|
||||
|
||||
- [TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF),
|
||||
for quantized models use:
|
||||
- `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`
|
||||
- `tinyllama-1.1b-chat-v1.0.Q4_0.gguf`
|
||||
|
||||
[^1]: For more information on GGUF see [the documentation](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md).
|
Reference in New Issue
Block a user