mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-12-16 02:08:55 +08:00

Files

Awni Hannun 2146bcd7ee Quantize embedding / Update quantize API (#680 )

* more async eval

* quantize embedding / update quantize api

* more updates for quantize

* update for quantize embeddings

* update sd quant API

* update sdxl quants

* error for datasets < batch_size

* async

* fix config loading

* fix quant

* fix tests

* fix req

* remove lm head if tie weights is true

* fix test

2024-04-18 18:16:10 -07:00

generate.py

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

models.py

Quantize embedding / Update quantize API (#680 )

2024-04-18 18:16:10 -07:00

README.md

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

requirements.txt

Switch to fast RMS/LN Norm (#603 )

2024-03-23 07:13:51 -07:00

utils.py

Example reading directly from gguf file (#222 )

2024-01-23 15:41:54 -08:00

README.md

LLMs in MLX with GGUF

An example generating text using GGUF format models in MLX.¹

Note

MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly: Q4_0, Q4_1, and Q8_0. Unsupported quantizations will be cast to float16.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

Run with:

python generate.py \
  --repo <hugging_face_repo> \
  --gguf <file.gguf> \
  --prompt "Write a quicksort in Python"

For example, to generate text with Mistral 7B use:

python generate.py \
  --repo TheBloke/Mistral-7B-v0.1-GGUF \
  --gguf mistral-7b-v0.1.Q8_0.gguf \
  --prompt "Write a quicksort in Python"

Run python generate.py --help for more options.

Models that have been tested and work include:

TheBloke/Mistral-7B-v0.1-GGUF, for quantized models use:
- mistral-7b-v0.1.Q8_0.gguf
- mistral-7b-v0.1.Q4_0.gguf
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF, for quantized models use:
- tinyllama-1.1b-chat-v1.0.Q8_0.gguf
- tinyllama-1.1b-chat-v1.0.Q4_0.gguf

For more information on GGUF see the documentation. ↩︎