mirror of
https://github.com/ml-explore/mlx-examples.git
synced 2025-09-01 04:14:38 +08:00
Configuration-based use of HF hub-hosted datasets for training (#701)
* Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training * Pre-commit formatting * Fix YAML config example * Print DS info * Include name * Add hf_dataset parameter default * Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility. * nits * update docs --------- Co-authored-by: Awni Hannun <awni@apple.com>
This commit is contained in:
@@ -69,3 +69,11 @@ lora_parameters:
|
||||
# warmup: 100 # 0 for no warmup
|
||||
# warmup_init: 1e-7 # 0 if not specified
|
||||
# arguments: [1e-5, 1000, 1e-7] # passed to scheduler
|
||||
|
||||
#hf_dataset:
|
||||
# name: "billsum"
|
||||
# train_split: "train[:1000]"
|
||||
# valid_split: "train[-100:]"
|
||||
# prompt_feature: "text"
|
||||
# completion_feature: "summary"
|
||||
|
||||
|
Reference in New Issue
Block a user