mlx-examples/llms/hf_llm
Awni Hannun 7b258f33ac
Move lora example to use the same model format / conversion as hf_llm (#252)
* huffing face the lora example to allow more models

* fixes

* comments

* more readme nits

* fusion + works better for qlora

* nits'

* comments
2024-01-09 11:14:52 -08:00
..
convert.py force fp16 for quantized models (#240) 2024-01-05 21:29:15 -08:00
generate.py Move lora example to use the same model format / conversion as hf_llm (#252) 2024-01-09 11:14:52 -08:00
models.py Move lora example to use the same model format / conversion as hf_llm (#252) 2024-01-09 11:14:52 -08:00
README.md Move lora example to use the same model format / conversion as hf_llm (#252) 2024-01-09 11:14:52 -08:00
requirements.txt refactor: merge deepseek coder example into hf_llm example (#234) 2024-01-06 07:53:46 -08:00

Generate Text with MLX and 🤗 Hugging Face

This an example of large language model text generation that can pull models from the Hugging Face Hub.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

python generate.py --model <model_path> --prompt "hello"

For example:

python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"

will download the Mistral 7B model and generate text using the given prompt.

The <model_path> should be either a path to a local directory or a Hugging Face repo with weights stored in safetensors format. If you use a repo from the Hugging Face Hub, then the model will be downloaded and cached the first time you run it. See the Models section for a full list of supported models.

Run python generate.py --help to see all the options.

Models

The example supports Hugging Face format Mistral and Llama-style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral and Llama style models should work out of the box.

Convert new models

You can convert (change the data type or quantize) models using the convert.py script. This script takes a Hugging Face repo as input and outputs a model directory (which you can optionally also upload to Hugging Face).

For example, to make a 4-bit quantized model, run:

python convert.py --hf-path <hf_repo> -q

For more options run:

python convert.py --help

You can upload new models to the Hugging Face MLX Community by specifying --upload-name to convert.py.