mlx-examples/llms/hf_llm/README.md

## Generate Text with MLX and :hugs: Hugging Face

This an example of large language model text generation that can pull models from
the Hugging Face Hub.

### Setup

Install the dependencies:

```
pip install -r requirements.txt
```

### Run

```
python generate.py --model <model_path> --prompt "hello"
```

For example:

```
python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"
```

will download the Mistral 7B model and generate text using the given prompt.

The `<model_path>` should be either a path to a local directory or a Hugging
Face repo with weights stored in `safetensors` format. If you use a repo from
the Hugging Face Hub, then the model will be downloaded and cached the first
time you run it. See the [Models](#models) section for a full list of supported models.

Run `python generate.py --help` to see all the options.


### Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models.  If the
model you want to run is not supported, file an
[issue](https://github.com/ml-explore/mlx-examples/issues/new) or better yet,
submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

- [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)
- [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
- [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)

Most
[Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),
[Llama](https://huggingface.co/models?library=transformers,safetensors&other=llama&sort=trending),
and
[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending)
style models should work out of the box.

### Convert new models 

You can convert (change the data type or quantize) models using the
`convert.py` script. This script takes a Hugging Face repo as input and outputs
a model directory (which you can optionally also upload to Hugging Face).

For example, to make a 4-bit quantized model, run:

```
python convert.py --hf-path <hf_repo> -q
```

For more options run:

```
python convert.py --help
```

You can upload new models to Hugging Face by specifying `--upload-repo` to
`convert.py`. For example, to upload a quantized Mistral-7B model to the 
[MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:

```
python convert.py \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload mlx-community/my-4bit-mistral \
```
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			`## Generate Text with MLX and :hugs: Hugging Face`

Update README.md (#243) a few typos 2024-01-07 03:44:49 +08:00			`This an example of large language model text generation that can pull models from`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			`the Hugging Face Hub.`

			`### Setup`

			`Install the dependencies:`

			```
			`pip install -r requirements.txt`
			```

			`### Run`

			```
			`python generate.py --model <model_path> --prompt "hello"`
			```

			`For example:`

			```
			`python generate.py --model mistralai/Mistral-7B-v0.1 --prompt "hello"`
			```

			`will download the Mistral 7B model and generate text using the given prompt.`

			The `<model_path>` should be either a path to a local directory or a Hugging
			Face repo with weights stored in `safetensors` format. If you use a repo from
			`the Hugging Face Hub, then the model will be downloaded and cached the first`
			`time you run it. See the [Models](#models) section for a full list of supported models.`

			Run `python generate.py --help` to see all the options.


			`### Models`

refactor(hf_llm): moving phi2 example into hf_llm (#293) * refactor: moving phi2 example into hf_llm * chore: clean up * chore: update phi2 model args so it can load args from config * fix phi2 + nits + readme * allow any HF repo, update README * fix bug in llama --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-12 04:29:12 +08:00			`The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			`model you want to run is not supported, file an`
			`[issue](https://github.com/ml-explore/mlx-examples/issues/new) or better yet,`
			`submit a pull request.`

Update README.md (#243) a few typos 2024-01-07 03:44:49 +08:00			`Here are a few examples of Hugging Face models that work with this example:`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00
			`- [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)`
			`- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)`
			`- [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)`
refactor: merge deepseek coder example into hf_llm example (#234) * refactor: merge deepseek coder example into hf_llm example * remove deepseek example * chore: fix format in readme * chore: remove default rope_scaling dict and use get to access type and factor to avoid key error * Update llms/hf_llm/models.py Co-authored-by: Awni Hannun <awni.hannun@gmail.com> * chore: fix lint --------- Co-authored-by: Awni Hannun <awni.hannun@gmail.com> 2024-01-06 23:53:46 +08:00			`- [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)`
			`- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)`
refactor(hf_llm): moving phi2 example into hf_llm (#293) * refactor: moving phi2 example into hf_llm * chore: clean up * chore: update phi2 model args so it can load args from config * fix phi2 + nits + readme * allow any HF repo, update README * fix bug in llama --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-12 04:29:12 +08:00			`- [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00
			`Most`
refactor(hf_llm): moving phi2 example into hf_llm (#293) * refactor: moving phi2 example into hf_llm * chore: clean up * chore: update phi2 model args so it can load args from config * fix phi2 + nits + readme * allow any HF repo, update README * fix bug in llama --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-12 04:29:12 +08:00			`[Mistral](https://huggingface.co/models?library=transformers,safetensors&other=mistral&sort=trending),`
			`[Llama](https://huggingface.co/models?library=transformers,safetensors&other=llama&sort=trending),`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			`and`
refactor(hf_llm): moving phi2 example into hf_llm (#293) * refactor: moving phi2 example into hf_llm * chore: clean up * chore: update phi2 model args so it can load args from config * fix phi2 + nits + readme * allow any HF repo, update README * fix bug in llama --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-12 04:29:12 +08:00			`[Phi-2](https://huggingface.co/models?library=transformers,safetensors&other=phi&sort=trending)`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			`style models should work out of the box.`

			`### Convert new models`

			`You can convert (change the data type or quantize) models using the`
			`convert.py` script. This script takes a Hugging Face repo as input and outputs
			`a model directory (which you can optionally also upload to Hugging Face).`

Move lora example to use the same model format / conversion as `hf_llm` (#252) * huffing face the lora example to allow more models * fixes * comments * more readme nits * fusion + works better for qlora * nits' * comments 2024-01-10 03:14:52 +08:00			`For example, to make a 4-bit quantized model, run:`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00
			```
Update README.md to fix --hf-model param call. (#229) Update `--hf-model` to `--hf-path` since the `--hf-model` param does not exist in convert.py. 2024-01-05 03:53:51 +08:00			`python convert.py --hf-path <hf_repo> -q`
Support Hugging Face models (#215) * support hf direct models 2024-01-04 07:13:26 +08:00			```

			`For more options run:`

			```
			`python convert.py --help`
			```

refactor(hf_llm): moving phi2 example into hf_llm (#293) * refactor: moving phi2 example into hf_llm * chore: clean up * chore: update phi2 model args so it can load args from config * fix phi2 + nits + readme * allow any HF repo, update README * fix bug in llama --------- Co-authored-by: Awni Hannun <awni@apple.com> 2024-01-12 04:29:12 +08:00			You can upload new models to Hugging Face by specifying `--upload-repo` to
			`convert.py`. For example, to upload a quantized Mistral-7B model to the
			`[MLX Hugging Face community](https://huggingface.co/mlx-community) you can do:`

			```
			`python convert.py \`
			`--hf-path mistralai/Mistral-7B-v0.1 \`
			`-q \`
			`--upload mlx-community/my-4bit-mistral \`
			```