MiniCPM implementation (#685)

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* Quick update

* added a dynamic rope scaling base calucaltion

* Added support for the MiniCPM architecture

* Added support for the MiniCPM architecture

* Updated utils.py and LORA.md

* Updated utils.py and LORA.md

* Update implementation details for MiniCPM architecture

* Cleaning up

* fixed the missing lm.head layer problem

* Refactor Model class to dynamically handle tied and untied word embeddings

* added a dynamic rope scaling base calucaltion

* quick fix and clean up

* clean up again

* removed the MiniCPMNorm class as its not used

* forgot something, sorry

* format

* version bump

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
Gökdeniz Gülmez
2024-04-26 00:29:28 +02:00
committed by GitHub
parent 685012c2ad
commit 2c1c9e9024
4 changed files with 251 additions and 22 deletions

View File

@@ -11,16 +11,17 @@ LoRA (QLoRA).[^qlora] LoRA fine-tuning works with the following model families:
- Qwen2
- Gemma
- OLMo
- MiniCPM
## Contents
* [Run](#Run)
* [Fine-tune](#Fine-tune)
* [Evaluate](#Evaluate)
* [Generate](#Generate)
* [Fuse](#Fuse)
* [Data](#Data)
* [Memory Issues](#Memory-Issues)
- [Run](#Run)
- [Fine-tune](#Fine-tune)
- [Evaluate](#Evaluate)
- [Generate](#Generate)
- [Fuse](#Fuse)
- [Data](#Data)
- [Memory Issues](#Memory-Issues)
## Run
@@ -122,7 +123,7 @@ To upload a fused model, supply the `--upload-repo` and `--hf-path` arguments
to `mlx_lm.fuse`. The latter is the repo name of the original model, which is
useful for the sake of attribution and model versioning.
For example, to fuse and upload a model derived from Mistral-7B-v0.1, run:
For example, to fuse and upload a model derived from Mistral-7B-v0.1, run:
```shell
mlx_lm.fuse \
@@ -144,38 +145,54 @@ can specify the file name with `--gguf-path`.
## Data
The LoRA command expects you to provide a dataset with `--data`. The MLX
The LoRA command expects you to provide a dataset with `--data`. The MLX
Examples GitHub repo has an [example of the WikiSQL
data](https://github.com/ml-explore/mlx-examples/tree/main/lora/data) in the
correct format.
For fine-tuning (`--train`), the data loader expects a `train.jsonl` and a
`valid.jsonl` to be in the data directory. For evaluation (`--test`), the data
loader expects a `test.jsonl` in the data directory.
loader expects a `test.jsonl` in the data directory.
Currently, `*.jsonl` files support three data formats: `chat`,
`completions`, and `text`. Here are three examples of these formats:
`chat`:
```jsonl
{"messages": [
{"role": "system", "content": "You are a helpful assistant." },
{"role": "user", "content": "Hello."},
{"role": "assistant", "content": "How can I assistant you today."},
]}
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello."
},
{
"role": "assistant",
"content": "How can I assistant you today."
}
]
}
```
`completions`:
```jsonl
{"prompt": "What is the capital of France?", "completion": "Paris."}
{
"prompt": "What is the capital of France?",
"completion": "Paris."
}
```
`text`:
```jsonl
{"text": "This is an example for the model."}
{
"text": "This is an example for the model."
}
```
Note, the format is automatically determined by the dataset. Note also, keys in
@@ -207,7 +224,7 @@ of memory. Here are some tips to reduce memory use should you need to do so:
1. Try quantization (QLoRA). You can use QLoRA by generating a quantized model
with `convert.py` and the `-q` flag. See the [Setup](#setup) section for
more details.
more details.
2. Try using a smaller batch size with `--batch-size`. The default is `4` so
setting this to `2` or `1` will reduce memory consumption. This may slow
@@ -244,6 +261,5 @@ tokens-per-second, using the MLX Example
[`wikisql`](https://github.com/ml-explore/mlx-examples/tree/main/lora/data)
data set.
[^lora]: Refer to the [arXiv paper](https://arxiv.org/abs/2106.09685) for more details on LoRA.
[^qlora]: Refer to the paper [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)