metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1
results: []
collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1115
- Num Input Tokens Seen: 63358856
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6281 | 0.0043 | 5 | 1.3934 | 274928 |
1.6766 | 0.0085 | 10 | 1.3767 | 544864 |
1.667 | 0.0128 | 15 | 1.3401 | 809432 |
1.509 | 0.0170 | 20 | 1.2828 | 1087464 |
1.4846 | 0.0213 | 25 | 1.2412 | 1359432 |
1.4854 | 0.0256 | 30 | 1.2019 | 1633040 |
1.3541 | 0.0298 | 35 | 1.1764 | 1897344 |
1.3155 | 0.0341 | 40 | 1.1756 | 2165096 |
1.0909 | 0.0384 | 45 | 1.1668 | 2444600 |
1.1077 | 0.0426 | 50 | 1.1836 | 2713008 |
0.9467 | 0.0469 | 55 | 1.1976 | 2984504 |
0.8303 | 0.0511 | 60 | 1.2374 | 3258672 |
0.7233 | 0.0554 | 65 | 1.2594 | 3526576 |
0.6752 | 0.0597 | 70 | 1.2789 | 3802264 |
0.6404 | 0.0639 | 75 | 1.2678 | 4073752 |
0.503 | 0.0682 | 80 | 1.2459 | 4336176 |
0.6266 | 0.0724 | 85 | 1.2244 | 4604864 |
0.4903 | 0.0767 | 90 | 1.2451 | 4879872 |
0.3967 | 0.0810 | 95 | 1.2479 | 5147992 |
0.478 | 0.0852 | 100 | 1.2385 | 5423432 |
0.3776 | 0.0895 | 105 | 1.2452 | 5690880 |
0.5358 | 0.0938 | 110 | 1.2197 | 5960008 |
0.3677 | 0.0980 | 115 | 1.2249 | 6227992 |
0.3285 | 0.1023 | 120 | 1.2273 | 6491128 |
0.3436 | 0.1065 | 125 | 1.2202 | 6750096 |
0.3619 | 0.1108 | 130 | 1.2275 | 7019536 |
0.2643 | 0.1151 | 135 | 1.2136 | 7290928 |
0.3383 | 0.1193 | 140 | 1.2097 | 7559208 |
0.2027 | 0.1236 | 145 | 1.2155 | 7833224 |
0.268 | 0.1278 | 150 | 1.2159 | 8099480 |
0.1734 | 0.1321 | 155 | 1.2109 | 8367592 |
0.2535 | 0.1364 | 160 | 1.2149 | 8642344 |
0.3114 | 0.1406 | 165 | 1.2048 | 8910480 |
0.2827 | 0.1449 | 170 | 1.2074 | 9185168 |
0.241 | 0.1492 | 175 | 1.2048 | 9457328 |
0.2103 | 0.1534 | 180 | 1.2092 | 9726280 |
0.2186 | 0.1577 | 185 | 1.2012 | 10003424 |
0.2764 | 0.1619 | 190 | 1.2015 | 10272208 |
0.3248 | 0.1662 | 195 | 1.2095 | 10544896 |
0.1936 | 0.1705 | 200 | 1.1958 | 10812488 |
0.2836 | 0.1747 | 205 | 1.2073 | 11089456 |
0.2181 | 0.1790 | 210 | 1.1978 | 11359256 |
0.2808 | 0.1832 | 215 | 1.2115 | 11622536 |
0.2303 | 0.1875 | 220 | 1.2023 | 11894088 |
0.2649 | 0.1918 | 225 | 1.1951 | 12169736 |
0.163 | 0.1960 | 230 | 1.2074 | 12438536 |
0.2623 | 0.2003 | 235 | 1.1926 | 12712496 |
0.2829 | 0.2045 | 240 | 1.1889 | 12973344 |
0.1654 | 0.2088 | 245 | 1.2087 | 13238288 |
0.2101 | 0.2131 | 250 | 1.1974 | 13497064 |
0.2679 | 0.2173 | 255 | 1.1970 | 13762864 |
0.21 | 0.2216 | 260 | 1.2069 | 14031040 |
0.2882 | 0.2259 | 265 | 1.1903 | 14305152 |
0.161 | 0.2301 | 270 | 1.1928 | 14565352 |
0.212 | 0.2344 | 275 | 1.1942 | 14832368 |
0.2318 | 0.2386 | 280 | 1.1857 | 15100944 |
0.2449 | 0.2429 | 285 | 1.1835 | 15373648 |
0.2106 | 0.2472 | 290 | 1.1859 | 15643152 |
0.1681 | 0.2514 | 295 | 1.1896 | 15916112 |
0.1699 | 0.2557 | 300 | 1.1857 | 16185920 |
0.1487 | 0.2599 | 305 | 1.1913 | 16451496 |
0.1664 | 0.2642 | 310 | 1.1857 | 16711480 |
0.187 | 0.2685 | 315 | 1.1839 | 16978216 |
0.1805 | 0.2727 | 320 | 1.1826 | 17252816 |
0.113 | 0.2770 | 325 | 1.1844 | 17524376 |
0.1424 | 0.2813 | 330 | 1.1780 | 17797640 |
0.1878 | 0.2855 | 335 | 1.1773 | 18070776 |
0.1559 | 0.2898 | 340 | 1.1812 | 18340296 |
0.1767 | 0.2940 | 345 | 1.1787 | 18607856 |
0.1796 | 0.2983 | 350 | 1.1758 | 18877784 |
0.1905 | 0.3026 | 355 | 1.1769 | 19145032 |
0.2067 | 0.3068 | 360 | 1.1701 | 19411080 |
0.1735 | 0.3111 | 365 | 1.1703 | 19683120 |
0.1467 | 0.3153 | 370 | 1.1710 | 19954368 |
0.1899 | 0.3196 | 375 | 1.1679 | 20220928 |
0.1742 | 0.3239 | 380 | 1.1711 | 20484120 |
0.1914 | 0.3281 | 385 | 1.1710 | 20756952 |
0.2192 | 0.3324 | 390 | 1.1657 | 21020880 |
0.1977 | 0.3367 | 395 | 1.1681 | 21289296 |
0.2191 | 0.3409 | 400 | 1.1701 | 21550608 |
0.2034 | 0.3452 | 405 | 1.1586 | 21819224 |
0.0956 | 0.3494 | 410 | 1.1638 | 22091752 |
0.2299 | 0.3537 | 415 | 1.1632 | 22362592 |
0.2555 | 0.3580 | 420 | 1.1605 | 22636728 |
0.1571 | 0.3622 | 425 | 1.1625 | 22904584 |
0.1971 | 0.3665 | 430 | 1.1617 | 23172496 |
0.184 | 0.3707 | 435 | 1.1643 | 23445400 |
0.1952 | 0.3750 | 440 | 1.1570 | 23714944 |
0.1909 | 0.3793 | 445 | 1.1621 | 23982496 |
0.1895 | 0.3835 | 450 | 1.1596 | 24260496 |
0.2287 | 0.3878 | 455 | 1.1568 | 24522088 |
0.2005 | 0.3921 | 460 | 1.1558 | 24794496 |
0.2329 | 0.3963 | 465 | 1.1526 | 25068160 |
0.1996 | 0.4006 | 470 | 1.1563 | 25338592 |
0.1696 | 0.4048 | 475 | 1.1597 | 25605840 |
0.1521 | 0.4091 | 480 | 1.1532 | 25868568 |
0.22 | 0.4134 | 485 | 1.1542 | 26134856 |
0.1432 | 0.4176 | 490 | 1.1551 | 26399024 |
0.2086 | 0.4219 | 495 | 1.1515 | 26672664 |
0.2092 | 0.4261 | 500 | 1.1499 | 26942488 |
0.2645 | 0.4304 | 505 | 1.1498 | 27215440 |
0.1131 | 0.4347 | 510 | 1.1493 | 27490712 |
0.1224 | 0.4389 | 515 | 1.1499 | 27763952 |
0.1864 | 0.4432 | 520 | 1.1474 | 28039296 |
0.1936 | 0.4475 | 525 | 1.1494 | 28312376 |
0.1106 | 0.4517 | 530 | 1.1513 | 28585312 |
0.1908 | 0.4560 | 535 | 1.1464 | 28865768 |
0.1765 | 0.4602 | 540 | 1.1448 | 29134744 |
0.1541 | 0.4645 | 545 | 1.1474 | 29409200 |
0.2046 | 0.4688 | 550 | 1.1473 | 29681272 |
0.2496 | 0.4730 | 555 | 1.1475 | 29955096 |
0.1368 | 0.4773 | 560 | 1.1476 | 30226144 |
0.1218 | 0.4815 | 565 | 1.1459 | 30504928 |
0.2231 | 0.4858 | 570 | 1.1427 | 30776800 |
0.163 | 0.4901 | 575 | 1.1443 | 31046360 |
0.2059 | 0.4943 | 580 | 1.1467 | 31322592 |
0.1813 | 0.4986 | 585 | 1.1417 | 31595528 |
0.1555 | 0.5028 | 590 | 1.1418 | 31860016 |
0.2493 | 0.5071 | 595 | 1.1478 | 32134176 |
0.2077 | 0.5114 | 600 | 1.1402 | 32404440 |
0.1179 | 0.5156 | 605 | 1.1431 | 32677328 |
0.1535 | 0.5199 | 610 | 1.1459 | 32946528 |
0.1342 | 0.5242 | 615 | 1.1429 | 33211040 |
0.1166 | 0.5284 | 620 | 1.1404 | 33480416 |
0.1949 | 0.5327 | 625 | 1.1422 | 33754328 |
0.1487 | 0.5369 | 630 | 1.1394 | 34023400 |
0.2949 | 0.5412 | 635 | 1.1387 | 34301016 |
0.1005 | 0.5455 | 640 | 1.1392 | 34568856 |
0.1058 | 0.5497 | 645 | 1.1401 | 34835552 |
0.2073 | 0.5540 | 650 | 1.1370 | 35107216 |
0.1519 | 0.5582 | 655 | 1.1394 | 35380776 |
0.1598 | 0.5625 | 660 | 1.1409 | 35650240 |
0.1849 | 0.5668 | 665 | 1.1389 | 35923440 |
0.2379 | 0.5710 | 670 | 1.1376 | 36202688 |
0.1738 | 0.5753 | 675 | 1.1405 | 36468728 |
0.1388 | 0.5796 | 680 | 1.1376 | 36743760 |
0.1362 | 0.5838 | 685 | 1.1347 | 37008264 |
0.1655 | 0.5881 | 690 | 1.1386 | 37274384 |
0.1673 | 0.5923 | 695 | 1.1373 | 37542432 |
0.1822 | 0.5966 | 700 | 1.1318 | 37814184 |
0.2159 | 0.6009 | 705 | 1.1359 | 38079800 |
0.0818 | 0.6051 | 710 | 1.1377 | 38353440 |
0.1606 | 0.6094 | 715 | 1.1314 | 38622312 |
0.2045 | 0.6136 | 720 | 1.1298 | 38888976 |
0.1382 | 0.6179 | 725 | 1.1335 | 39157440 |
0.1373 | 0.6222 | 730 | 1.1324 | 39424448 |
0.1212 | 0.6264 | 735 | 1.1321 | 39685152 |
0.1736 | 0.6307 | 740 | 1.1328 | 39955296 |
0.2216 | 0.6350 | 745 | 1.1322 | 40224264 |
0.159 | 0.6392 | 750 | 1.1309 | 40489880 |
0.2088 | 0.6435 | 755 | 1.1308 | 40760920 |
0.1727 | 0.6477 | 760 | 1.1288 | 41036304 |
0.1264 | 0.6520 | 765 | 1.1313 | 41308776 |
0.143 | 0.6563 | 770 | 1.1341 | 41587328 |
0.1625 | 0.6605 | 775 | 1.1328 | 41858424 |
0.2043 | 0.6648 | 780 | 1.1314 | 42130400 |
0.1761 | 0.6690 | 785 | 1.1297 | 42408016 |
0.1527 | 0.6733 | 790 | 1.1288 | 42679608 |
0.1701 | 0.6776 | 795 | 1.1300 | 42942480 |
0.2072 | 0.6818 | 800 | 1.1308 | 43218032 |
0.1241 | 0.6861 | 805 | 1.1280 | 43496640 |
0.1741 | 0.6904 | 810 | 1.1297 | 43772392 |
0.174 | 0.6946 | 815 | 1.1314 | 44035960 |
0.1829 | 0.6989 | 820 | 1.1298 | 44311064 |
0.1704 | 0.7031 | 825 | 1.1278 | 44579240 |
0.174 | 0.7074 | 830 | 1.1272 | 44853824 |
0.1689 | 0.7117 | 835 | 1.1281 | 45122712 |
0.1658 | 0.7159 | 840 | 1.1275 | 45396376 |
0.1402 | 0.7202 | 845 | 1.1272 | 45659992 |
0.1506 | 0.7244 | 850 | 1.1272 | 45938136 |
0.1741 | 0.7287 | 855 | 1.1271 | 46205480 |
0.1701 | 0.7330 | 860 | 1.1284 | 46465984 |
0.1296 | 0.7372 | 865 | 1.1305 | 46725880 |
0.222 | 0.7415 | 870 | 1.1291 | 46996672 |
0.1797 | 0.7458 | 875 | 1.1266 | 47262816 |
0.1854 | 0.7500 | 880 | 1.1240 | 47542048 |
0.1431 | 0.7543 | 885 | 1.1235 | 47820544 |
0.1636 | 0.7585 | 890 | 1.1276 | 48087000 |
0.1267 | 0.7628 | 895 | 1.1238 | 48357488 |
0.1658 | 0.7671 | 900 | 1.1219 | 48629528 |
0.1863 | 0.7713 | 905 | 1.1293 | 48899952 |
0.1718 | 0.7756 | 910 | 1.1283 | 49171336 |
0.2038 | 0.7798 | 915 | 1.1232 | 49434544 |
0.1561 | 0.7841 | 920 | 1.1253 | 49697136 |
0.1312 | 0.7884 | 925 | 1.1250 | 49959432 |
0.1334 | 0.7926 | 930 | 1.1256 | 50228984 |
0.1727 | 0.7969 | 935 | 1.1272 | 50492880 |
0.1703 | 0.8012 | 940 | 1.1220 | 50761584 |
0.1577 | 0.8054 | 945 | 1.1225 | 51026360 |
0.1965 | 0.8097 | 950 | 1.1240 | 51298120 |
0.128 | 0.8139 | 955 | 1.1219 | 51565616 |
0.1908 | 0.8182 | 960 | 1.1229 | 51835072 |
0.1182 | 0.8225 | 965 | 1.1249 | 52098072 |
0.121 | 0.8267 | 970 | 1.1219 | 52363024 |
0.1408 | 0.8310 | 975 | 1.1221 | 52633408 |
0.1766 | 0.8352 | 980 | 1.1262 | 52901664 |
0.1924 | 0.8395 | 985 | 1.1221 | 53172064 |
0.0933 | 0.8438 | 990 | 1.1195 | 53439088 |
0.1967 | 0.8480 | 995 | 1.1248 | 53708936 |
0.2113 | 0.8523 | 1000 | 1.1240 | 53972568 |
0.1773 | 0.8565 | 1005 | 1.1232 | 54246864 |
0.1417 | 0.8608 | 1010 | 1.1192 | 54508856 |
0.1155 | 0.8651 | 1015 | 1.1206 | 54774728 |
0.2128 | 0.8693 | 1020 | 1.1234 | 55045544 |
0.1417 | 0.8736 | 1025 | 1.1195 | 55322176 |
0.1838 | 0.8779 | 1030 | 1.1187 | 55594384 |
0.1663 | 0.8821 | 1035 | 1.1188 | 55867672 |
0.1767 | 0.8864 | 1040 | 1.1180 | 56144856 |
0.0898 | 0.8906 | 1045 | 1.1197 | 56423512 |
0.2001 | 0.8949 | 1050 | 1.1176 | 56700800 |
0.2134 | 0.8992 | 1055 | 1.1176 | 56967968 |
0.1479 | 0.9034 | 1060 | 1.1187 | 57237424 |
0.135 | 0.9077 | 1065 | 1.1166 | 57512288 |
0.1949 | 0.9119 | 1070 | 1.1154 | 57785904 |
0.1553 | 0.9162 | 1075 | 1.1168 | 58061496 |
0.1645 | 0.9205 | 1080 | 1.1176 | 58337408 |
0.1523 | 0.9247 | 1085 | 1.1163 | 58612048 |
0.2493 | 0.9290 | 1090 | 1.1176 | 58885584 |
0.1114 | 0.9333 | 1095 | 1.1175 | 59162552 |
0.1506 | 0.9375 | 1100 | 1.1158 | 59425968 |
0.1647 | 0.9418 | 1105 | 1.1154 | 59701184 |
0.1982 | 0.9460 | 1110 | 1.1149 | 59966112 |
0.1758 | 0.9503 | 1115 | 1.1135 | 60230008 |
0.1625 | 0.9546 | 1120 | 1.1132 | 60497792 |
0.1389 | 0.9588 | 1125 | 1.1136 | 60773880 |
0.1207 | 0.9631 | 1130 | 1.1148 | 61040768 |
0.1471 | 0.9673 | 1135 | 1.1152 | 61313120 |
0.162 | 0.9716 | 1140 | 1.1124 | 61587408 |
0.1433 | 0.9759 | 1145 | 1.1122 | 61853400 |
0.1479 | 0.9801 | 1150 | 1.1130 | 62126856 |
0.1043 | 0.9844 | 1155 | 1.1137 | 62392328 |
0.125 | 0.9887 | 1160 | 1.1138 | 62659824 |
0.2094 | 0.9929 | 1165 | 1.1131 | 62932648 |
0.1797 | 0.9972 | 1170 | 1.1126 | 63200224 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1