mlx-examples

zhangyiss/mlx-examples

Fork 0

mirror of https://github.com/ml-explore/mlx-examples.git synced 2025-08-29 18:26:37 +08:00

Commit Graph

Author	SHA1	Message	Date
Anupam Mediratta	607c300e18	Add Direct Preference Optimization (DPO) method Fixes #513 Implement the Direct Preference Optimization (DPO) method as a Reinforcement Learning from Human Feedback (RLHF) example. * Add DPO Functions: Add `get_batched_logps` and `dpo_loss` functions to `llms/mlx_lm/utils.py` for DPO implementation. * Update Training Logic: Update `llms/mlx_lm/tuner/trainer.py` to include DPO-specific training logic, including a new `dpo_loss` function and condition to check for DPO loss in the training loop. * Add Configuration Options: Add configuration options for DPO in `llms/mlx_lm/examples/lora_config.yaml`. * Update Documentation: Update `llms/mlx_lm/README.md` to include instructions for using DPO. * Add Unit Tests: Add `llms/tests/test_dpo.py` with unit tests for `get_batched_logps`, `dpo_loss`, and DPO-specific training logic. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ml-explore/mlx-examples/issues/513?shareId=XXXX-XXXX-XXXX-XXXX).	2025-02-12 15:21:21 +05:30
Anchen	362e88a744	feat: move lora into mlx-lm (#337 ) * feat: Add lora and qlora training to mlx-lm --------- Co-authored-by: Awni Hannun <awni@apple.com>	2024-01-23 08:44:37 -08:00
Awni Hannun	c6440416a2	Mlx llm package (#301 ) * fix converter * add recursive files * remove gitignore * remove gitignore * add packages properly * read me update * remove dup readme * relative * fix convert * fix community name * fix url * version	2024-01-12 10:25:56 -08:00

Author

SHA1

Message

Date

Anupam Mediratta

607c300e18

Add Direct Preference Optimization (DPO) method

Fixes #513

Implement the Direct Preference Optimization (DPO) method as a Reinforcement Learning from Human Feedback (RLHF) example.

* **Add DPO Functions**: Add `get_batched_logps` and `dpo_loss` functions to `llms/mlx_lm/utils.py` for DPO implementation.
* **Update Training Logic**: Update `llms/mlx_lm/tuner/trainer.py` to include DPO-specific training logic, including a new `dpo_loss` function and condition to check for DPO loss in the training loop.
* **Add Configuration Options**: Add configuration options for DPO in `llms/mlx_lm/examples/lora_config.yaml`.
* **Update Documentation**: Update `llms/mlx_lm/README.md` to include instructions for using DPO.
* **Add Unit Tests**: Add `llms/tests/test_dpo.py` with unit tests for `get_batched_logps`, `dpo_loss`, and DPO-specific training logic.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ml-explore/mlx-examples/issues/513?shareId=XXXX-XXXX-XXXX-XXXX).

2025-02-12 15:21:21 +05:30

Anchen

362e88a744

feat: move lora into mlx-lm (#337 )

* feat: Add lora and qlora training to mlx-lm


---------

Co-authored-by: Awni Hannun <awni@apple.com>

2024-01-23 08:44:37 -08:00

Awni Hannun

c6440416a2

Mlx llm package (#301 )

* fix converter

* add recursive files

* remove gitignore

* remove gitignore

* add packages properly

* read me update

* remove dup readme

* relative

* fix convert

* fix community name

* fix url

* version

2024-01-12 10:25:56 -08:00

3 Commits