Commit Graph

17 Commits

Author SHA1 Message Date
Chime Ogbuji
cb87f6f22c Add response template (or token) argument
For use in calculating mask for everything up to the after the response prompt (i.e., the continuation/completion)
2025-02-09 07:43:01 -08:00
Chime Ogbuji
f989401881 Default for hf_datasets configuration 2025-02-09 07:41:24 -08:00
Chime Ogbuji
69282ab7fc Minor fix 2025-02-09 07:41:24 -08:00
Chime Ogbuji
4890870053 Add ability to fetch raw prompt and completion text from completion datasets 2025-02-09 07:41:23 -08:00
Chime Ogbuji
a5b866cf73 Fix index calculation 2025-02-09 07:41:01 -08:00
Chime Ogbuji
a4a86ad898 Fix iteration over HF dataset collection 2025-02-09 07:41:01 -08:00
Chime Ogbuji
78c33e5037 Fix keyword argument invokation 2025-02-09 07:41:00 -08:00
Chime Ogbuji
387c45efa2 Fixes to references to hf_datasets 2025-02-09 07:40:09 -08:00
Chime Ogbuji
14a75f3f03 Generalize HF datasets to a collection of HF dataasets via datasets, adds support for custom chat HF datasets (#1088), and fixes (#1087) 2025-02-09 07:38:40 -08:00
Chime Ogbuji
79a042768f Replace iterate_input_masked_batches with iterate_delineated_batches, an updated attempt to better sync with iterate_batches logic 2025-02-09 07:12:54 -08:00
Victor Nogueira
df1406735b
Fix dataset variable name, in datasets.py (#1212) 2025-01-21 14:12:43 -08:00
Chime Ogbuji
0228c46434
Custom local dataset features (#1085)
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.

* Persist configured prompt/completion key

* rebase + nits

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2025-01-13 10:01:18 -08:00
Awni Hannun
c4833a2f55
fix encoding with special tokens + chat template (#1189) 2025-01-03 10:50:59 -08:00
madroid
aa1c8abdc6
LoRA: Support HuggingFace dataset via data parameter (#996)
* LoRA: support huggingface dataset via `data` argument

* LoRA: Extract the load_custom_hf_dataset function

* LoRA: split small functions

* fix spelling errors

* handle load hf dataset error

* fix pre-commit lint

* update data argument help

* nits and doc

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-30 07:36:21 -07:00
madroid
7ec2021bb9
LoRA: support tools(function calling) format datasets (#995)
* LoRA: support fine-tuning tools datasets

* LoRA: Split small function

* LoRA: add tools format to lora docs

* LoRA: pre-commit fix

* Revert "LoRA: pre-commit fix"

This reverts commit b94b7e0fe7.

* Revert "LoRA: Split small function"

This reverts commit 3f6a5f19fd.

* LoRA: remove ToolsDataset

In a JSONL file, not all data is required to include the tools value.

* nit in readme

* nit in readme

* nit in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-09-28 10:41:36 -07:00
Chime Ogbuji
df6bc09d74
Configuration-based use of HF hub-hosted datasets for training (#701)
* Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training

* Pre-commit formatting

* Fix YAML config example

* Print DS info

* Include name

* Add hf_dataset parameter default

* Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility.

* nits

* update docs

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-06-26 10:20:50 -07:00
madroid
b0bcd86a40
Support for OpenAI’s fine-tuning dataset format (#548)
* LoRA: move load_dataset to tuner/datasets.py file

* LoRA: support OpenAI chat format datasets

see https://platform.openai.com/docs/guides/fine-tuning/example-format

* LoRA: support OpenAI completion format datasets

* LoRA: formatting dataset timing to reduce memory footprint

* Refactor dataset item access in PromptCompletionDataset

* Update mlx_lm/LORA.md

* Update mlx_lm/LORA.md

* check Unsupported data format

* add tests, fine-tune doc

* add tests, fine-tune doc

* add jinja2 for chat template

* nits in readme

* nits in readme

---------

Co-authored-by: Awni Hannun <awni@apple.com>
2024-03-19 16:45:46 -07:00