mlx-examples/llms/hf_llm
2024-01-11 16:56:50 -08:00
..
models Delete llms/hf_llm/models/.gitignore (#300) 2024-01-11 16:56:50 -08:00
.gitignore refactor(hf_llm): moving phi2 example into hf_llm (#293) 2024-01-11 12:29:12 -08:00
convert.py refactor(hf_llm): moving phi2 example into hf_llm (#293) 2024-01-11 12:29:12 -08:00
generate.py refactor(hf_llm): moving phi2 example into hf_llm (#293) 2024-01-11 12:29:12 -08:00
models.py Move lora example to use the same model format / conversion as hf_llm (#252) 2024-01-09 11:14:52 -08:00
README.md refactor(hf_llm): moving phi2 example into hf_llm (#293) 2024-01-11 12:29:12 -08:00
requirements.txt refactor: merge deepseek coder example into hf_llm example (#234) 2024-01-06 07:53:46 -08:00
utils.py refactor(hf_llm): moving phi2 example into hf_llm (#293) 2024-01-11 12:29:12 -08:00

Generate Text with MLX and 🤗 Hugging Face

This an example of large language model text generation that can pull models from the Hugging Face Hub.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

python generate.py --model <model_path> --prompt "hello"

For example:

python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"

will download the Mistral 7B model and generate text using the given prompt.

The <model_path> should be either a path to a local directory or a Hugging Face repo with weights stored in safetensors format. If you use a repo from the Hugging Face Hub, then the model will be downloaded and cached the first time you run it. See the Models section for a full list of supported models.

Run python generate.py --help to see all the options.

Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, and Phi-2 style models should work out of the box.

Convert new models

You can convert (change the data type or quantize) models using the convert.py script. This script takes a Hugging Face repo as input and outputs a model directory (which you can optionally also upload to Hugging Face).

For example, to make a 4-bit quantized model, run:

python convert.py --hf-path <hf_repo> -q

For more options run:

python convert.py --help

You can upload new models to Hugging Face by specifying --upload-repo to convert.py. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python convert.py \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload mlx-community/my-4bit-mistral \