2023-12-14 11:22:56 +08:00
|
|
|
# Phi-2
|
|
|
|
|
2023-12-15 01:19:44 +08:00
|
|
|
Phi-2 is a 2.7B parameter language model released by Microsoft with
|
|
|
|
performance that rivals much larger models.[^1] It was trained on a mixture of
|
2023-12-15 00:37:34 +08:00
|
|
|
GPT-4 outputs and clean web text.
|
2023-12-15 00:34:24 +08:00
|
|
|
|
2023-12-15 00:37:34 +08:00
|
|
|
Phi-2 efficiently runs on Apple silicon devices with 8GB of memory in 16-bit
|
|
|
|
precision.
|
2023-12-14 11:22:56 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
### Setup
|
2023-12-14 11:22:56 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
Install the dependencies:
|
2023-12-14 11:22:56 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
```
|
|
|
|
pip install -r requirements.txt
|
2023-12-14 11:22:56 +08:00
|
|
|
```
|
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
### Run
|
2023-12-22 04:59:37 +08:00
|
|
|
```
|
2024-01-08 22:01:23 +08:00
|
|
|
python generate.py --model <model_path> --prompt "hello"
|
2023-12-22 04:59:37 +08:00
|
|
|
```
|
2024-01-08 22:01:23 +08:00
|
|
|
For example:
|
2023-12-22 04:59:37 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
```
|
|
|
|
python generate.py --model microsoft/phi-2 --prompt "hello"
|
|
|
|
```
|
|
|
|
The `<model_path>` should be either a path to a local directory or a Hugging
|
|
|
|
Face repo with weights stored in `safetensors` format. If you use a repo from
|
|
|
|
the Hugging Face Hub, then the model will be downloaded and cached the first
|
|
|
|
time you run it.
|
2023-12-14 11:22:56 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
Run `python generate.py --help` to see all the options.
|
2023-12-21 02:22:25 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
### Convert new models
|
2023-12-20 22:57:13 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
You can convert (change the data type or quantize) models using the
|
|
|
|
`convert.py` script. This script takes a Hugging Face repo as input and outputs
|
|
|
|
a model directory (which you can optionally also upload to Hugging Face).
|
2023-12-14 11:22:56 +08:00
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
For example, to make 4-bit quantized a model, run:
|
2023-12-14 11:22:56 +08:00
|
|
|
|
|
|
|
```
|
2024-01-08 22:01:23 +08:00
|
|
|
python convert.py --hf-path <hf_repo> -q
|
2023-12-15 00:18:01 +08:00
|
|
|
```
|
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
For more options run:
|
2023-12-15 00:18:01 +08:00
|
|
|
|
|
|
|
```
|
2024-01-08 22:01:23 +08:00
|
|
|
python convert.py --help
|
2023-12-15 00:34:24 +08:00
|
|
|
```
|
|
|
|
|
2024-01-08 22:01:23 +08:00
|
|
|
You can upload new models to the [Hugging Face MLX
|
|
|
|
Community](https://huggingface.co/mlx-community) by specifying `--upload-name``
|
|
|
|
to `convert.py`.
|
2023-12-15 00:18:01 +08:00
|
|
|
|
|
|
|
[^1]: For more details on the model see the [blog post](
|
|
|
|
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)
|
2024-01-08 22:01:23 +08:00
|
|
|
and the [Hugging Face repo](https://huggingface.co/microsoft/phi-2)
|