LoRA: Support HuggingFace dataset via data parameter (#996)

* LoRA: support huggingface dataset via `data` argument

* LoRA: Extract the load_custom_hf_dataset function

* LoRA: split small functions

* fix spelling errors

* handle load hf dataset error

* fix pre-commit lint

* update data argument help

* nits and doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
madroid
2024-09-30 22:36:21 +08:00
committed by GitHub
parent 50e5ca81a8
commit aa1c8abdc6
3 changed files with 93 additions and 51 deletions

View File

@@ -79,7 +79,10 @@ def build_parser():
parser.add_argument(
"--data",
type=str,
help="Directory with {train, valid, test}.jsonl files",
help=(
"Directory with {train, valid, test}.jsonl files or the name "
"of a Hugging Face dataset (e.g., 'mlx-community/wikisql')"
),
)
parser.add_argument(
"--fine-tune-type",