LoRA: Support HuggingFace dataset via data parameter (#996)

* LoRA: support huggingface dataset via `data` argument

* LoRA: Extract the load_custom_hf_dataset function

* LoRA: split small functions

* fix spelling errors

* handle load hf dataset error

* fix pre-commit lint

* update data argument help

* nits and doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
madroid
2024-09-30 22:36:21 +08:00
committed by GitHub
parent 50e5ca81a8
commit aa1c8abdc6
3 changed files with 93 additions and 51 deletions

View File

@@ -251,7 +251,13 @@ To use Hugging Face datasets, first install the `datasets` package:
pip install datasets
```
Specify the Hugging Face dataset arguments in a YAML config. For example:
If the Hugging Face dataset is already in a supported format, you can specify
it on the command line. For example, pass `--data mlx-community/wikisql` to
train on the pre-formatted WikiwSQL data.
Otherwise, provide a mapping of keys in the dataset to the features MLX LM
expects. Use a YAML config to specify the Hugging Face dataset arguments. For
example:
```
hf_dataset: