LoRA: Support HuggingFace dataset via data parameter (#996)

* LoRA: support huggingface dataset via `data` argument * LoRA: Extract the load_custom_hf_dataset function * LoRA: split small functions * fix spelling errors * handle load hf dataset error * fix pre-commit lint * update data argument help * nits and doc --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-09-01 12:49:50 +08:00 · 2024-09-30 22:36:21 +08:00
parent 50e5ca81a8
commit aa1c8abdc6
3 changed files with 93 additions and 51 deletions
--- a/llms/mlx_lm/lora.py
+++ b/llms/mlx_lm/lora.py
@@ -79,7 +79,10 @@ def build_parser():
    parser.add_argument(
        "--data",
        type=str,
-        help="Directory with {train, valid, test}.jsonl files",
+        help=(
+            "Directory with {train, valid, test}.jsonl files or the name "
+            "of a Hugging Face dataset (e.g., 'mlx-community/wikisql')"
+        ),
    )
    parser.add_argument(
        "--fine-tune-type",