![]() * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments |
||
---|---|---|
.. | ||
convert.py | ||
generate.py | ||
models.py | ||
README.md | ||
requirements.txt |
Generate Text with MLX and 🤗 Hugging Face
This an example of large language model text generation that can pull models from the Hugging Face Hub.
Setup
Install the dependencies:
pip install -r requirements.txt
Run
python generate.py --model <model_path> --prompt "hello"
For example:
python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"
will download the Mistral 7B model and generate text using the given prompt.
The <model_path>
should be either a path to a local directory or a Hugging
Face repo with weights stored in safetensors
format. If you use a repo from
the Hugging Face Hub, then the model will be downloaded and cached the first
time you run it. See the Models section for a full list of supported models.
Run python generate.py --help
to see all the options.
Models
The example supports Hugging Face format Mistral and Llama-style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.
Here are a few examples of Hugging Face models that work with this example:
- mistralai/Mistral-7B-v0.1
- meta-llama/Llama-2-7b-hf
- TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
- deepseek-ai/deepseek-coder-6.7b-instruct
- 01-ai/Yi-6B-Chat
Most Mistral and Llama style models should work out of the box.
Convert new models
You can convert (change the data type or quantize) models using the
convert.py
script. This script takes a Hugging Face repo as input and outputs
a model directory (which you can optionally also upload to Hugging Face).
For example, to make a 4-bit quantized model, run:
python convert.py --hf-path <hf_repo> -q
For more options run:
python convert.py --help
You can upload new models to the Hugging Face MLX
Community by specifying --upload-name
to convert.py
.