mlx-examples/llms/gguf_llm
锦此 f7bbe458ae Add timeout to generate functions
Add timeout handling to various `generate` functions across multiple files.

* **cvae/main.py**
  - Add `timeout` parameter to `generate` function.
  - Implement timeout handling using `signal` module in `generate` function.

* **flux/dreambooth.py**
  - Add `timeout` parameter to `generate_progress_images` function.
  - Implement timeout handling using `signal` module in `generate_progress_images` function.

* **musicgen/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **stable_diffusion/txt2image.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **llava/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

* **llms/gguf_llm/generate.py**
  - Add `timeout` parameter to `generate` function.
  - Implement timeout handling using `signal` module in `generate` function.

* **llms/mlx_lm/generate.py**
  - Add `timeout` parameter to `main` function.
  - Implement timeout handling using `signal` module in `main` function.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/jincdream/mlx-examples?shareId=XXXX-XXXX-XXXX-XXXX).
2024-10-22 17:06:58 +08:00
..
generate.py Add timeout to generate functions 2024-10-22 17:06:58 +08:00
models.py Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717) 2024-04-29 20:11:32 -07:00
README.md Add Supported Quantized Phi-3-mini-4k-instruct gguf Weight (#717) 2024-04-29 20:11:32 -07:00
requirements.txt fixed the requirements (#803) 2024-05-29 06:14:19 -07:00
utils.py Example reading directly from gguf file (#222) 2024-01-23 15:41:54 -08:00

LLMs in MLX with GGUF

An example generating text using GGUF format models in MLX.1

Note

MLX is able to read most quantization formats from GGUF directly. However, only a few quantizations are supported directly: Q4_0, Q4_1, and Q8_0. Unsupported quantizations will be cast to float16.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

Run with:

python generate.py \
  --repo <hugging_face_repo> \
  --gguf <file.gguf> \
  --prompt "Write a quicksort in Python"

For example, to generate text with Mistral 7B use:

python generate.py \
  --repo TheBloke/Mistral-7B-v0.1-GGUF \
  --gguf mistral-7b-v0.1.Q8_0.gguf \
  --prompt "Write a quicksort in Python"

Run python generate.py --help for more options.

Models that have been tested and work include:


  1. For more information on GGUF see the documentation. ↩︎