Commit Graph

61 Commits

Author SHA1 Message Date
Goekdeniz-Guelmez
05d921b788 optims 2025-02-03 19:37:05 +01:00
Goekdeniz-Guelmez
1d9e4802f0 first working prototype, will try training out at home 2025-02-03 12:05:29 +01:00
Goekdeniz-Guelmez
23d75cd7ad starting fist training test run 2025-02-03 10:08:28 +01:00
Goekdeniz-Guelmez
a3ed632422 dataset wrapper done 2025-02-03 09:13:17 +01:00
Goekdeniz-Guelmez
d034ca369e adding function for R1 2025-02-03 08:26:42 +01:00
Goekdeniz-Guelmez
243c9621d9 update lora.py 2025-01-31 21:10:44 +01:00
Goekdeniz-Guelmez
a57d553fc1 update 2025-01-31 16:57:43 +01:00
Goekdeniz-Guelmez
80bcf68956 grpo_trainer shoudl be done 2025-01-31 16:54:18 +01:00
Goekdeniz-Guelmez
6c58aa995c updates 2025-01-31 16:27:31 +01:00
Goekdeniz-Guelmez
93370ff1c3 updates ans fixing the KL div lines 2025-01-30 23:55:40 +01:00
Goekdeniz-Guelmez
5e0ae83487 initial commit, gn 2025-01-29 00:19:07 +01:00