mlx-examples/llms/phi2/README.md

# Phi-2

Phi-2 is a 2.7B parameter language model released by Microsoft with
performance that rivals much larger models.[^1] It was trained on a mixture of
GPT-4 outputs and clean web text.

Phi-2 efficiently runs on Apple silicon devices with 8GB of memory in 16-bit
precision.

### Setup

Install the dependencies:

```
pip install -r requirements.txt
```

### Run
```
python generate.py --model <model_path> --prompt "hello"
```
For example:

```
python generate.py --model microsoft/phi-2 --prompt "hello"
```
The `<model_path>` should be either a path to a local directory or a Hugging
Face repo with weights stored in `safetensors` format. If you use a repo from
the Hugging Face Hub, then the model will be downloaded and cached the first
time you run it. 

Run `python generate.py --help` to see all the options.

### Convert new models 

You can convert (change the data type or quantize) models using the
`convert.py` script. This script takes a Hugging Face repo as input and outputs
a model directory (which you can optionally also upload to Hugging Face).

For example, to make 4-bit quantized a model, run:

```
python convert.py --hf-path <hf_repo> -q
```

For more options run:

```
python convert.py --help
```

You can upload new models to the [Hugging Face MLX
Community](https://huggingface.co/mlx-community) by specifying `--upload-name``
to `convert.py`.

[^1]: For more details on the model see the [blog post](
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)
and the [Hugging Face repo](https://huggingface.co/microsoft/phi-2)
phi-2 draft 2023-12-14 11:22:56 +08:00			`# Phi-2`

cleanup conversion to use single qkv matrix 2023-12-15 01:19:44 +08:00			`Phi-2 is a 2.7B parameter language model released by Microsoft with`
			`performance that rivals much larger models.[^1] It was trained on a mixture of`
update readme 2023-12-15 00:37:34 +08:00			`GPT-4 outputs and clean web text.`
change file name for consistency, update readme. 2023-12-15 00:34:24 +08:00
update readme 2023-12-15 00:37:34 +08:00			`Phi-2 efficiently runs on Apple silicon devices with 8GB of memory in 16-bit`
			`precision.`
phi-2 draft 2023-12-14 11:22:56 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`### Setup`
phi-2 draft 2023-12-14 11:22:56 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`Install the dependencies:`
phi-2 draft 2023-12-14 11:22:56 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			```
			`pip install -r requirements.txt`
phi-2 draft 2023-12-14 11:22:56 +08:00			```

refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`### Run`
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00			```
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`python generate.py --model <model_path> --prompt "hello"`
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00			```
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`For example:`
Quantize example (#162) * testing quantization * conversion + quantization working * one config processor * quantization in mistral / nits in llama * args for quantization * llama / mistral conversion in good shape * phi2 quantized * mixtral * qwen conversion 2023-12-22 04:59:37 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			```
			`python generate.py --model microsoft/phi-2 --prompt "hello"`
			```
			The `<model_path>` should be either a path to a local directory or a Hugging
			Face repo with weights stored in `safetensors` format. If you use a repo from
			`the Hugging Face Hub, then the model will be downloaded and cached the first`
			`time you run it.`
phi-2 draft 2023-12-14 11:22:56 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			Run `python generate.py --help` to see all the options.
Add llms subdir + update README (#145) * add llms subdir + update README * nits * use same pre-commit as mlx * update readmes a bit * format 2023-12-21 02:22:25 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`### Convert new models`
Add URLs to HF MLX-Community org. (#153) * up * Add ref to MLX org on the README. * nit: language. * Standardise org name. 2023-12-20 22:57:13 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`You can convert (change the data type or quantize) models using the`
			`convert.py` script. This script takes a Hugging Face repo as input and outputs
			`a model directory (which you can optionally also upload to Hugging Face).`
phi-2 draft 2023-12-14 11:22:56 +08:00
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`For example, to make 4-bit quantized a model, run:`
phi-2 draft 2023-12-14 11:22:56 +08:00
			```
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`python convert.py --hf-path <hf_repo> -q`
fix args, update README, remove extra files 2023-12-15 00:18:01 +08:00			```

refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`For more options run:`
fix args, update README, remove extra files 2023-12-15 00:18:01 +08:00
			```
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`python convert.py --help`
change file name for consistency, update readme. 2023-12-15 00:34:24 +08:00			```

refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`You can upload new models to the [Hugging Face MLX`
			Community](https://huggingface.co/mlx-community) by specifying `--upload-name``
			to `convert.py`.
fix args, update README, remove extra files 2023-12-15 00:18:01 +08:00
			`[^1]: For more details on the model see the [blog post](`
			`https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)`
refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) * refactor: make the phi2 example can be directly load the model from hf without convert needed * chore: add super().__init__() for all module, otherwise will cause error in lora 2024-01-08 22:01:23 +08:00			`and the [Hugging Face repo](https://huggingface.co/microsoft/phi-2)`