Commit Graph

32 Commits

Author SHA1 Message Date
Goekdeniz-Guelmez
e96afe9e9f updates 2025-02-11 09:09:28 +01:00
Goekdeniz-Guelmez
88ca747e9e nits 2025-02-10 19:46:19 +01:00
Goekdeniz-Guelmez
b7bc811507 nits 2025-02-10 19:45:19 +01:00
Goekdeniz-Guelmez
e5aa2c3b5d nits 2025-02-10 17:51:14 +01:00
Goekdeniz-Guelmez
f88e897019 removing helper functions 2025-02-10 16:07:28 +01:00
Goekdeniz-Guelmez
d9da35f458 nits 2025-02-10 10:52:32 +01:00
Goekdeniz-Guelmez
00712522ba rebase loss calculation 2025-02-09 17:13:05 +01:00
Goekdeniz-Guelmez
a527cdb39b fix: prevent gradients from flowing through the reference model's logits 2025-02-09 17:02:58 +01:00
Goekdeniz-Guelmez
54179901b5 fix 2025-02-09 15:41:47 +01:00
Goekdeniz-Guelmez
9ba6146a76 fix 2025-02-09 14:32:50 +01:00
Goekdeniz-Guelmez
bcfa55d882 updates 2025-02-05 15:02:12 +01:00
Goekdeniz-Guelmez
0a19522ec4 updates 2025-02-05 14:38:09 +01:00
Goekdeniz-Guelmez
35a2d99cf9 smoll fix 2025-02-05 11:30:21 +01:00
Goekdeniz-Guelmez
a33cad84b4 udpates 2025-02-05 09:48:00 +01:00
Goekdeniz-Guelmez
2a8e6f6e44 udpate 2025-02-05 08:47:03 +01:00
Goekdeniz-Guelmez
0a09a93454 fix cache handling 2025-02-05 08:44:06 +01:00
Goekdeniz-Guelmez
7173840283 first succesfull training run 2025-02-04 09:18:45 +01:00
Goekdeniz-Guelmez
ca32424043 updates 2025-02-03 21:57:26 +01:00
Goekdeniz-Guelmez
54e295ea80 fix name funcs 2025-02-03 19:56:11 +01:00
Goekdeniz-Guelmez
06f9c29c94 print func name 2025-02-03 19:47:40 +01:00
Goekdeniz-Guelmez
40bca770ae fixes 2025-02-03 19:43:49 +01:00
Goekdeniz-Guelmez
05d921b788 optims 2025-02-03 19:37:05 +01:00
Goekdeniz-Guelmez
1d9e4802f0 first working prototype, will try training out at home 2025-02-03 12:05:29 +01:00
Goekdeniz-Guelmez
23d75cd7ad starting fist training test run 2025-02-03 10:08:28 +01:00
Goekdeniz-Guelmez
a3ed632422 dataset wrapper done 2025-02-03 09:13:17 +01:00
Goekdeniz-Guelmez
d034ca369e adding function for R1 2025-02-03 08:26:42 +01:00
Goekdeniz-Guelmez
243c9621d9 update lora.py 2025-01-31 21:10:44 +01:00
Goekdeniz-Guelmez
a57d553fc1 update 2025-01-31 16:57:43 +01:00
Goekdeniz-Guelmez
80bcf68956 grpo_trainer shoudl be done 2025-01-31 16:54:18 +01:00
Goekdeniz-Guelmez
6c58aa995c updates 2025-01-31 16:27:31 +01:00
Goekdeniz-Guelmez
93370ff1c3 updates ans fixing the KL div lines 2025-01-30 23:55:40 +01:00
Goekdeniz-Guelmez
5e0ae83487 initial commit, gn 2025-01-29 00:19:07 +01:00