mlx-examples/llms/phi2
2024-01-09 06:43:59 -08:00
..
convert.py refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) 2024-01-08 06:01:23 -08:00
generate.py fix: use of undefined args in generate function in phi-2 example (#265) 2024-01-09 06:43:59 -08:00
phi2.py fix: use of undefined args in generate function in phi-2 example (#265) 2024-01-09 06:43:59 -08:00
README.md refactor: make the phi2 example can be directly load the model from hf without convert needed (#253) 2024-01-08 06:01:23 -08:00
requirements.txt Add llms subdir + update README (#145) 2023-12-20 10:22:25 -08:00

Phi-2

Phi-2 is a 2.7B parameter language model released by Microsoft with performance that rivals much larger models.1 It was trained on a mixture of GPT-4 outputs and clean web text.

Phi-2 efficiently runs on Apple silicon devices with 8GB of memory in 16-bit precision.

Setup

Install the dependencies:

pip install -r requirements.txt

Run

python generate.py --model <model_path> --prompt "hello"

For example:

python generate.py --model microsoft/phi-2 --prompt "hello"

The <model_path> should be either a path to a local directory or a Hugging Face repo with weights stored in safetensors format. If you use a repo from the Hugging Face Hub, then the model will be downloaded and cached the first time you run it.

Run python generate.py --help to see all the options.

Convert new models

You can convert (change the data type or quantize) models using the convert.py script. This script takes a Hugging Face repo as input and outputs a model directory (which you can optionally also upload to Hugging Face).

For example, to make 4-bit quantized a model, run:

python convert.py --hf-path <hf_repo> -q

For more options run:

python convert.py --help

You can upload new models to the Hugging Face MLX Community by specifying --upload-name`` to convert.py`.


  1. For more details on the model see the blog post and the Hugging Face repo ↩︎