Fixes#513
Implement the Direct Preference Optimization (DPO) method as a Reinforcement Learning from Human Feedback (RLHF) example.
* **Add DPO Functions**: Add `get_batched_logps` and `dpo_loss` functions to `llms/mlx_lm/utils.py` for DPO implementation.
* **Update Training Logic**: Update `llms/mlx_lm/tuner/trainer.py` to include DPO-specific training logic, including a new `dpo_loss` function and condition to check for DPO loss in the training loop.
* **Add Configuration Options**: Add configuration options for DPO in `llms/mlx_lm/examples/lora_config.yaml`.
* **Update Documentation**: Update `llms/mlx_lm/README.md` to include instructions for using DPO.
* **Add Unit Tests**: Add `llms/tests/test_dpo.py` with unit tests for `get_batched_logps`, `dpo_loss`, and DPO-specific training logic.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ml-explore/mlx-examples/issues/513?shareId=XXXX-XXXX-XXXX-XXXX).
- Optional completion only fine-tuning with `--mask-prompt`
- Collections of Hugging Face datasets
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* chore(mlx-lm): support text type content
* chore: optimize the messagef content processing
* nits + format
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Generalize prompt_feature and completion_feature for use in local datasets to facilitate compatibility with many other training dataset formats.
* Persist configured prompt/completion key
* rebase + nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* deepseekv3
* use upload_large_file instead of deprecated multi comit
* add pipeline generation and example
* comment
* get fp16 working
* use mlx==0.22
* Add support for multiturn fewshot examples and chat templates
Added two new arguments to the evaluation script: `--fewshot-as-multiturn` and `--apply-chat-template` which correspond to lm_eval options of similar names and are very often used to ensure apples-to-apples comparisons of lm_evaluation results
* Add HF overrides for methods needed by added options
* don't add duplicate bos
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* improvements to manage. Default value is N and size added to deletion confirmation.
* Fixing case for no case
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>
* Support for multiple EOS tokens
* Change _eos_token_ids type from list to set
* Remove model_config & add eos_token_id
* nits
---------
Co-authored-by: Awni Hannun <awni@apple.com>