mlx-examples/llms/hf_llm/README.md

87 lines
2.8 KiB
Markdown
Raw Normal View History

## Generate Text with MLX and :hugs: Hugging Face
2024-01-07 03:44:49 +08:00
This an example of large language model text generation that can pull models from
the Hugging Face Hub.
### Setup
Install the dependencies:
```
pip install -r requirements.txt
```
### Run
```
python generate.py --model <model_path> --prompt "hello"
```
For example:
```
python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"
```
will download the Mistral 7B model and generate text using the given prompt.
The `<model_path>` should be either a path to a local directory or a Hugging
Face repo with weights stored in `safetensors` format. If you use a repo from
the Hugging Face Hub, then the model will be downloaded and cached the first
time you run it. See the [Models](#models) section for a full list of supported models.
Run `python generate.py --help` to see all the options.
### Models
The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the
model you want to run is not supported, file an
[issue](https://github.com/ml-explore/mlx-examples/issues/new) or better yet,
submit a pull request.
2024-01-07 03:44:49 +08:00
Here are a few examples of Hugging Face models that work with this example:
- [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)
- [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
- [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
Most
[Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
[Llama](https://huggingface.co/models?library=transformers,safetensors&other=llama&sort=trending),
and
[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending)
style models should work out of the box.
### Convert new models
You can convert (change the data type or quantize) models using the
`convert.py` script. This script takes a Hugging Face repo as input and outputs
a model directory (which you can optionally also upload to Hugging Face).
For example, to make a 4-bit quantized model, run:
```
python convert.py --hf-path <hf_repo> -q
```
For more options run:
```
python convert.py --help
```
You can upload new models to Hugging Face by specifying `--upload-repo` to
`convert.py`. For example, to upload a quantized Mistral-7B model to the
[MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:
```
python convert.py \
--hf-path mistralai/Mistral-7B-v0.1 \
-q \
--upload mlx-community/my-4bit-mistral \
```