Added lora support for Phi-2 (#302)

* Added lora support for Phi-2

* Added Phi-2 support in fuse and convert

* format + readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
Yousif
2024-01-12 13:45:30 -08:00
committed by GitHub
parent 3ac731dd4f
commit 7575125d5d
12 changed files with 564 additions and 25 deletions

View File

@@ -2,7 +2,7 @@
This is an example of using MLX to fine-tune an LLM with low rank adaptation
(LoRA) for a target task.[^lora] The example also supports quantized LoRA
(QLoRA).[^qlora] The example works with Llama and Mistral style
(QLoRA).[^qlora] The example works with Llama, Mistral, and Phi-2 style
models available on Hugging Face.
In this example we'll use the WikiSQL[^wikisql] dataset to train the LLM to
@@ -81,7 +81,7 @@ To fine-tune a model use:
```
python lora.py --model <path_to_model> \
--train \
--iters 600
--iters 600 \
```
If `--model` points to a quantized model, then the training will use QLoRA,
@@ -100,7 +100,7 @@ To compute test set perplexity use:
```
python lora.py --model <path_to_model> \
--adapter-file <path_to_adapters.npz> \
--test
--test \
```
### Generate
@@ -114,7 +114,7 @@ python lora.py --model <path_to_model> \
--prompt "table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A: "
A: " \
```
## Results
@@ -211,7 +211,7 @@ python lora.py \
--model mistralai/Mistral-7B-v0.1 \
--train \
--batch-size 1 \
--lora-layers 4
--lora-layers 4 \
```
The above command on an M1 Max with 32 GB runs at about 250 tokens-per-second.