LoRA: Support HuggingFace dataset via data parameter (#996)

* LoRA: support huggingface dataset via `data` argument * LoRA: Extract the load_custom_hf_dataset function * LoRA: split small functions * fix spelling errors * handle load hf dataset error * fix pre-commit lint * update data argument help * nits and doc --------- Co-authored-by: Awni Hannun <awni@apple.com>
2025-12-16 02:08:55 +08:00 · 2024-09-30 22:36:21 +08:00
parent 50e5ca81a8
commit aa1c8abdc6
3 changed files with 93 additions and 51 deletions
--- a/llms/mlx_lm/LORA.md
+++ b/llms/mlx_lm/LORA.md
@@ -251,7 +251,13 @@ To use Hugging Face datasets, first install the `datasets` package:
 pip install datasets
 ```

-Specify the Hugging Face dataset arguments in a YAML config. For example:
+If the Hugging Face dataset is already in a supported format, you can specify
+it on the command line. For example, pass `--data mlx-community/wikisql` to
+train on the pre-formatted WikiwSQL data.
+
+Otherwise, provide a mapping of keys in the dataset to the features MLX LM
+expects. Use a YAML config to specify the Hugging Face dataset arguments. For
+example:

 ```
 hf_dataset: