mlx-examples

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-28 15:50:57 +08:00

Author	SHA1	Message	Date
Awni Hannun	eda597bdef	simplify	2025-02-09 19:37:11 -08:00
Awni Hannun	bb2c8bcf96	more nits	2025-02-09 18:00:17 -08:00
Awni Hannun	6e9542a934	put offset in prompt, simplify	2025-02-09 17:31:23 -08:00
Awni Hannun	6ace6dc6b2	simplify collections	2025-02-09 08:33:42 -08:00
Chime Ogbuji	b9748e9ee4	Generalize the get_item method to all CompletionDatasets	2025-02-09 07:44:17 -08:00
Chime Ogbuji	7989d0a874	Move response template to LoRA configuration	2025-02-09 07:43:37 -08:00
Chime Ogbuji	cb87f6f22c	Add response template (or token) argument For use in calculating mask for everything up to the after the response prompt (i.e., the continuation/completion)	2025-02-09 07:43:01 -08:00
Chime Ogbuji	f989401881	Default for hf_datasets configuration	2025-02-09 07:41:24 -08:00
Chime Ogbuji	69282ab7fc	Minor fix	2025-02-09 07:41:24 -08:00
Chime Ogbuji	4890870053	Add ability to fetch raw prompt and completion text from completion datasets	2025-02-09 07:41:23 -08:00
Chime Ogbuji	a5b866cf73	Fix index calculation	2025-02-09 07:41:01 -08:00
Chime Ogbuji	a4a86ad898	Fix iteration over HF dataset collection	2025-02-09 07:41:01 -08:00
Chime Ogbuji	78c33e5037	Fix keyword argument invokation	2025-02-09 07:41:00 -08:00
Chime Ogbuji	387c45efa2	Fixes to references to hf_datasets	2025-02-09 07:40:09 -08:00
Chime Ogbuji	14a75f3f03	Generalize HF datasets to a collection of HF dataasets via `datasets`, adds support for custom chat HF datasets (#1088 ), and fixes (#1087 )	2025-02-09 07:38:40 -08:00
Chime Ogbuji	79a042768f	Replace iterate_input_masked_batches with iterate_delineated_batches, an updated attempt to better sync with iterate_batches logic	2025-02-09 07:12:54 -08:00
Victor Nogueira	df1406735b	Fix dataset variable name, in `datasets.py` (#1212 )	2025-01-21 14:12:43 -08:00
Chime Ogbuji	0228c46434	Custom local dataset features (#1085 ) * Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats. * Persist configured prompt/completion key * rebase + nits --------- Co-authored-by: Awni Hannun <awni@apple.com>	2025-01-13 10:01:18 -08:00
Awni Hannun	c4833a2f55	fix encoding with special tokens + chat template (#1189 )	2025-01-03 10:50:59 -08:00
madroid	aa1c8abdc6	LoRA: Support HuggingFace dataset via data parameter (#996 ) * LoRA: support huggingface dataset via `data` argument * LoRA: Extract the load_custom_hf_dataset function * LoRA: split small functions * fix spelling errors * handle load hf dataset error * fix pre-commit lint * update data argument help * nits and doc --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-30 07:36:21 -07:00
madroid	7ec2021bb9	LoRA: support tools(function calling) format datasets (#995 ) * LoRA: support fine-tuning tools datasets * LoRA: Split small function * LoRA: add tools format to lora docs * LoRA: pre-commit fix * Revert "LoRA: pre-commit fix" This reverts commit `b94b7e0fe7`. * Revert "LoRA: Split small function" This reverts commit `3f6a5f19fd`. * LoRA: remove ToolsDataset In a JSONL file, not all data is required to include the tools value. * nit in readme * nit in readme * nit in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-09-28 10:41:36 -07:00
Chime Ogbuji	df6bc09d74	Configuration-based use of HF hub-hosted datasets for training (#701 ) * Add hf_dataset configuration for using HF hub-hosted datasets for (Q)LoRA training * Pre-commit formatting * Fix YAML config example * Print DS info * Include name * Add hf_dataset parameter default * Remove TextHFDataset and CompletionsHFDataset and use Dataset and CompletionsDataset instead, adding a text_key constructor argument to the former (and changing it to work with a provided data structure instead of just from a JSON file), and prompt_key and completion_key arguments to the latter with defaults for backwards compatibility. * nits * update docs --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-06-26 10:20:50 -07:00
madroid	b0bcd86a40	Support for OpenAI’s fine-tuning dataset format (#548 ) * LoRA: move load_dataset to tuner/datasets.py file * LoRA: support OpenAI chat format datasets see https://platform.openai.com/docs/guides/fine-tuning/example-format * LoRA: support OpenAI completion format datasets * LoRA: formatting dataset timing to reduce memory footprint * Refactor dataset item access in PromptCompletionDataset * Update mlx_lm/LORA.md * Update mlx_lm/LORA.md * check Unsupported data format * add tests, fine-tune doc * add tests, fine-tune doc * add jinja2 for chat template * nits in readme * nits in readme --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-03-19 16:45:46 -07:00

23 Commits