Commit Graph

12 Commits

Author SHA1 Message Date
Goekdeniz-Guelmez
541677aa7f cleaning up 2025-01-31 21:36:24 +01:00
Goekdeniz-Guelmez
2f2ddd4811 clean up 2025-01-26 15:17:06 +01:00
Goekdeniz-Guelmez
d8e7834345 Removed rejected_rewards handling, Updated batch unpacking to match iterator, Updated batch unpacking to match iterator, Added preference score scaling, Simplified reward calculation, Removed redundant rejected_rewards 2025-01-25 21:35:37 +01:00
Goekdeniz-Guelmez
09ed837896 updates 2025-01-24 16:57:18 +01:00
Goekdeniz-Guelmez
e3688293ed removing dpo and fixing some stuff for orpo 2025-01-24 16:09:22 +01:00
Goekdeniz-Guelmez
0bb001121e niits 2025-01-22 21:39:29 +01:00
Goekdeniz-Guelmez
363bde634e fixes 2025-01-19 13:45:33 +01:00
Goekdeniz-Guelmez
ea0d11cd2f update 2025-01-19 02:05:43 +01:00
Goekdeniz-Guelmez
424cb854e9 nits 2025-01-19 02:03:50 +01:00
Goekdeniz-Guelmez
9ede9db19b nits 2025-01-19 02:03:31 +01:00
Goekdeniz-Guelmez
fa80d081f2 finish 2025-01-19 01:58:29 +01:00
Goekdeniz-Guelmez
a9b7609118 initial commit 2025-01-19 01:09:43 +01:00