mlx-examples/lora/README.md

181 lines
5.7 KiB
Markdown
Raw Normal View History

2024-01-05 13:05:59 +08:00
# Fine-Tuning with LoRA or QLoRA
2023-11-30 06:14:11 +08:00
This is an example of using MLX to fine-tune either a Llama 7B[^llama] or a
Mistral 7B[^mistral] model with low rank adaptation (LoRA)[^lora] for a target
2024-01-05 13:05:59 +08:00
task. The example also supports quantized LoRA (QLoRA).[^qlora]
2023-11-30 06:14:11 +08:00
In this example we'll use the WikiSQL[^wikisql] dataset to train the LLM to
generate SQL queries from natural language. However, the example is intended to
2023-12-16 01:56:10 +08:00
be general should you wish to use a custom dataset.
## Contents
2023-12-16 01:59:07 +08:00
* [Setup](#Setup)
* [Run](#Run)
* [Fine-tune](#Fine-tune)
* [Evaluate](#Evaluate)
* [Generate](#Generate)
* [Results](#Results)
* [Custom Data](#Custom-Data)
* [Memory Issues](#Memory-Issues)
2023-12-16 01:56:10 +08:00
2023-11-30 06:14:11 +08:00
## Setup
Install the dependencies:
```
pip install -r requirements.txt
```
Next, download and convert the model. The Mistral weights can be downloaded with:
```
curl -O https://files.mistral-7b-v0-1.mistral.ai/mistral-7B-v0.1.tar
tar -xf mistral-7B-v0.1.tar
```
If you do not have access to the Llama weights you will need to [request
access](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
2023-11-30 06:14:11 +08:00
from Meta.
2023-12-10 06:20:19 +08:00
Convert the model with:
2023-11-30 06:14:11 +08:00
```
2023-12-16 02:06:14 +08:00
python convert.py \
2024-01-05 13:05:59 +08:00
--torch-path <path_to_torch_model> \
--mlx-path <path_to_mlx_model>
2023-11-30 06:14:11 +08:00
```
2024-01-05 13:05:59 +08:00
If you wish to use QLoRA, then convert the model with 4-bit quantization using
the `-q` option.
2023-11-30 06:14:11 +08:00
## Run
The main script is `lora.py`. To see a full list of options run
```
python lora.py --help
```
2023-12-16 02:06:14 +08:00
### Fine-tune
2023-11-30 06:14:11 +08:00
To fine-tune a model use:
```
python lora.py --model <path_to_model> \
2023-11-30 06:14:11 +08:00
--train \
--iters 600
2023-11-30 06:14:11 +08:00
```
2024-01-05 13:05:59 +08:00
If `--model` points to a quantized model, then the training will use QLoRA,
otherwise it will use regular LoRA.
Note, the model path should have the MLX weights, the tokenizer, and the
2024-01-05 13:05:59 +08:00
`config.json` which will all be output by the `convert.py` script.
2023-11-30 06:14:11 +08:00
By default, the adapter weights are saved in `adapters.npz`. You can specify
2023-12-16 02:06:14 +08:00
the output location with `--adapter-file`.
2023-11-30 06:14:11 +08:00
2023-12-16 02:06:14 +08:00
You can resume fine-tuning with an existing adapter with `--resume-adapter-file
<path_to_adapters.npz>`.
2023-12-16 01:56:10 +08:00
2023-12-16 02:06:14 +08:00
### Evaluate
2023-11-30 06:14:11 +08:00
To compute test set perplexity use
```
python lora.py --model <path_to_model> \
2023-12-16 02:06:14 +08:00
--adapter-file <path_to_adapters.npz> \
2023-11-30 06:14:11 +08:00
--test
```
2023-12-16 02:06:14 +08:00
### Generate
2023-11-30 06:14:11 +08:00
For generation use
```
python lora.py --model <path_to_model> \
2023-12-16 02:06:14 +08:00
--adapter-file <path_to_adapters.npz> \
2023-11-30 06:14:11 +08:00
--num-tokens 50 \
--prompt "table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A: "
```
## Results
The initial validation loss for Llama 7B on the WikiSQL is 2.66 and the final
validation loss after 1000 iterations is 1.23. The table below shows the
training and validation loss at a few points over the course of training.
| Iteration | Train Loss | Validation Loss |
| --------- | ---------- | --------------- |
| 1 | N/A | 2.659 |
| 200 | 1.264 | 1.405 |
| 400 | 1.201 | 1.303 |
| 600 | 1.123 | 1.274 |
| 800 | 1.017 | 1.255 |
| 1000 | 1.070 | 1.230 |
The model trains at around 475 tokens per second on an M2 Ultra.
2023-12-16 01:56:10 +08:00
## Custom Data
You can make your own dataset for fine-tuning with LoRA. You can specify the
dataset with `--data=<my_data_directory>`. Check the subdirectory `data/` to
see the expected format.
2023-12-16 02:06:14 +08:00
For fine-tuning (`--train`), the data loader expects a `train.jsonl` and a
`valid.jsonl` to be in the data directory. For evaluation (`--test`), the data
loader expects a `test.jsonl` in the data directory. Each line in the `*.jsonl`
file should look like:
2023-12-16 01:56:10 +08:00
```
{"text": "This is an example for the model."}
```
Note other keys will be ignored by the loader.
## Memory Issues
2023-12-22 06:17:11 +08:00
Fine-tuning a large model with LoRA requires a machine with a decent amount
2023-12-16 01:56:10 +08:00
of memory. Here are some tips to reduce memory use should you need to do so:
2024-01-05 13:05:59 +08:00
1. Try quantization (QLoRA). You can use QLoRA by generating a quantized model
with `convert.py` and the `-q` flag. See the [Setup](#setup) section for
more details.
2. Try using a smaller batch size with `--batch-size`. The default is `4` so
2023-12-16 01:56:10 +08:00
setting this to `2` or `1` will reduce memory consumption. This may slow
things down a little, but will also reduce the memory use.
2024-01-05 13:05:59 +08:00
3. Reduce the number of layers to fine-tune with `--lora-layers`. The default
2023-12-16 01:56:10 +08:00
is `16`, so you can try `8` or `4`. This reduces the amount of memory
needed for back propagation. It may also reduce the quality of the
fine-tuned model if you are fine-tuning with a lot of data.
2024-01-05 13:05:59 +08:00
4. Longer examples require more memory. If it makes sense for your data, one thing
2023-12-16 01:56:10 +08:00
you can do is break your examples into smaller
sequences when making the `{train, valid, test}.jsonl` files.
2023-12-16 04:18:29 +08:00
For example, for a machine with 32 GB the following should run reasonably fast:
```
python lora.py \
--model <path_to_model> \
--train \
--batch-size 1 \
--lora-layers 4
```
2023-12-16 04:20:15 +08:00
The above command on an M1 Max with 32 GB runs at about 250 tokens-per-second.
2023-12-16 04:18:29 +08:00
2023-11-30 06:14:11 +08:00
[^lora]: Refer to the [arXiv paper](https://arxiv.org/abs/2106.09685) for more details on LoRA.
2024-01-05 13:05:59 +08:00
[^qlora]: Refer to the paper [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
2023-11-30 06:14:11 +08:00
[^llama]: Refer to the [arXiv paper](https://arxiv.org/abs/2302.13971) and [blog post](https://ai.meta.com/blog/large-language-model-llama-meta-ai/) for more details.
[^mistral]: Refer to the [blog post](https://mistral.ai/news/announcing-mistral-7b/) and [github repository](https://github.com/mistralai/mistral-src) for more details.
2023-11-30 06:14:11 +08:00
[^wikisql]: Refer to the [GitHub repo](https://github.com/salesforce/WikiSQL/tree/master) for more information about WikiSQL.