Goekdeniz-Guelmez
|
541677aa7f
|
cleaning up
|
2025-01-31 21:36:24 +01:00 |
|
Goekdeniz-Guelmez
|
2f2ddd4811
|
clean up
|
2025-01-26 15:17:06 +01:00 |
|
Goekdeniz-Guelmez
|
d8e7834345
|
Removed rejected_rewards handling, Updated batch unpacking to match iterator, Updated batch unpacking to match iterator, Added preference score scaling, Simplified reward calculation, Removed redundant rejected_rewards
|
2025-01-25 21:35:37 +01:00 |
|
Goekdeniz-Guelmez
|
09ed837896
|
updates
|
2025-01-24 16:57:18 +01:00 |
|
Goekdeniz-Guelmez
|
e3688293ed
|
removing dpo and fixing some stuff for orpo
|
2025-01-24 16:09:22 +01:00 |
|
Goekdeniz-Guelmez
|
0bb001121e
|
niits
|
2025-01-22 21:39:29 +01:00 |
|
Goekdeniz-Guelmez
|
363bde634e
|
fixes
|
2025-01-19 13:45:33 +01:00 |
|
Goekdeniz-Guelmez
|
ea0d11cd2f
|
update
|
2025-01-19 02:05:43 +01:00 |
|
Goekdeniz-Guelmez
|
424cb854e9
|
nits
|
2025-01-19 02:03:50 +01:00 |
|
Goekdeniz-Guelmez
|
9ede9db19b
|
nits
|
2025-01-19 02:03:31 +01:00 |
|
Goekdeniz-Guelmez
|
fa80d081f2
|
finish
|
2025-01-19 01:58:29 +01:00 |
|
Goekdeniz-Guelmez
|
a9b7609118
|
initial commit
|
2025-01-19 01:09:43 +01:00 |
|