mlx-examples/llms/hf_llm/README.md
Anchen a2402116ae
refactor(hf_llm): moving phi2 example into hf_llm (#293)
* refactor: moving phi2 example into hf_llm

* chore: clean up

* chore: update phi2 model args so it can load args from config

* fix phi2 + nits + readme

* allow any HF repo, update README

* fix bug in llama

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-01-11 12:29:12 -08:00

2.8 KiB

Generate Text with MLX and 🤗 Hugging Face

This an example of large language model text generation that can pull models from the Hugging Face Hub.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

python generate.py --model <model_path> --prompt "hello"

For example:

python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"

will download the Mistral 7B model and generate text using the given prompt.

The <model_path> should be either a path to a local directory or a Hugging Face repo with weights stored in safetensors format. If you use a repo from the Hugging Face Hub, then the model will be downloaded and cached the first time you run it. See the Models section for a full list of supported models.

Run python generate.py --help to see all the options.

Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, and Phi-2 style models should work out of the box.

Convert new models

You can convert (change the data type or quantize) models using the convert.py script. This script takes a Hugging Face repo as input and outputs a model directory (which you can optionally also upload to Hugging Face).

For example, to make a 4-bit quantized model, run:

python convert.py --hf-path <hf_repo> -q

For more options run:

python convert.py --help

You can upload new models to Hugging Face by specifying --upload-repo to convert.py. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python convert.py \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload mlx-community/my-4bit-mistral \