metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0
results: []
collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1163
- Num Input Tokens Seen: 79329904
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6259 | 0.0034 | 5 | 1.3948 | 269192 |
1.7205 | 0.0067 | 10 | 1.3856 | 535744 |
1.6747 | 0.0101 | 15 | 1.3559 | 807760 |
1.6258 | 0.0135 | 20 | 1.3143 | 1073304 |
1.4352 | 0.0168 | 25 | 1.2657 | 1347048 |
1.3674 | 0.0202 | 30 | 1.2317 | 1615280 |
1.3574 | 0.0236 | 35 | 1.1989 | 1885648 |
1.3486 | 0.0269 | 40 | 1.1784 | 2156656 |
1.2541 | 0.0303 | 45 | 1.1793 | 2421504 |
1.1753 | 0.0337 | 50 | 1.1790 | 2687560 |
1.0309 | 0.0371 | 55 | 1.1886 | 2951648 |
0.9656 | 0.0404 | 60 | 1.2106 | 3227216 |
0.8768 | 0.0438 | 65 | 1.2146 | 3497872 |
0.6429 | 0.0472 | 70 | 1.2774 | 3764712 |
0.7068 | 0.0505 | 75 | 1.2819 | 4030952 |
0.6668 | 0.0539 | 80 | 1.2590 | 4301160 |
0.5531 | 0.0573 | 85 | 1.2669 | 4568312 |
0.418 | 0.0606 | 90 | 1.2660 | 4834208 |
0.351 | 0.0640 | 95 | 1.2736 | 5100584 |
0.4759 | 0.0674 | 100 | 1.2518 | 5363000 |
0.3357 | 0.0707 | 105 | 1.2374 | 5633512 |
0.4113 | 0.0741 | 110 | 1.2428 | 5897448 |
0.3705 | 0.0775 | 115 | 1.2439 | 6165216 |
0.2944 | 0.0808 | 120 | 1.2403 | 6424248 |
0.3285 | 0.0842 | 125 | 1.2456 | 6681496 |
0.3726 | 0.0876 | 130 | 1.2270 | 6952128 |
0.2679 | 0.0910 | 135 | 1.2238 | 7220384 |
0.252 | 0.0943 | 140 | 1.2258 | 7487104 |
0.3314 | 0.0977 | 145 | 1.2162 | 7763376 |
0.2311 | 0.1011 | 150 | 1.2244 | 8032536 |
0.3076 | 0.1044 | 155 | 1.2139 | 8303104 |
0.2792 | 0.1078 | 160 | 1.2208 | 8566312 |
0.2223 | 0.1112 | 165 | 1.2174 | 8842536 |
0.2285 | 0.1145 | 170 | 1.2217 | 9101304 |
0.2407 | 0.1179 | 175 | 1.2076 | 9369776 |
0.2957 | 0.1213 | 180 | 1.2121 | 9634840 |
0.3073 | 0.1246 | 185 | 1.2132 | 9899744 |
0.2155 | 0.1280 | 190 | 1.2084 | 10168352 |
0.206 | 0.1314 | 195 | 1.2147 | 10440512 |
0.2419 | 0.1347 | 200 | 1.2024 | 10707544 |
0.1884 | 0.1381 | 205 | 1.2068 | 10975600 |
0.1609 | 0.1415 | 210 | 1.2079 | 11242968 |
0.2126 | 0.1448 | 215 | 1.2093 | 11511848 |
0.2478 | 0.1482 | 220 | 1.2074 | 11783608 |
0.2409 | 0.1516 | 225 | 1.2116 | 12054688 |
0.2337 | 0.1550 | 230 | 1.2080 | 12318472 |
0.1825 | 0.1583 | 235 | 1.2044 | 12595544 |
0.2339 | 0.1617 | 240 | 1.2087 | 12861552 |
0.208 | 0.1651 | 245 | 1.2029 | 13125424 |
0.17 | 0.1684 | 250 | 1.2086 | 13400288 |
0.1689 | 0.1718 | 255 | 1.1979 | 13669320 |
0.2218 | 0.1752 | 260 | 1.1995 | 13933120 |
0.2991 | 0.1785 | 265 | 1.2025 | 14196528 |
0.1689 | 0.1819 | 270 | 1.2027 | 14457312 |
0.1555 | 0.1853 | 275 | 1.2016 | 14729752 |
0.2401 | 0.1886 | 280 | 1.2039 | 14996792 |
0.1565 | 0.1920 | 285 | 1.1978 | 15263432 |
0.1755 | 0.1954 | 290 | 1.1972 | 15524472 |
0.2002 | 0.1987 | 295 | 1.1961 | 15796048 |
0.1917 | 0.2021 | 300 | 1.1959 | 16062768 |
0.1862 | 0.2055 | 305 | 1.2027 | 16340792 |
0.2141 | 0.2089 | 310 | 1.1931 | 16617552 |
0.214 | 0.2122 | 315 | 1.1955 | 16882528 |
0.1397 | 0.2156 | 320 | 1.2007 | 17143744 |
0.2168 | 0.2190 | 325 | 1.1883 | 17411360 |
0.118 | 0.2223 | 330 | 1.1881 | 17675528 |
0.1473 | 0.2257 | 335 | 1.2000 | 17939752 |
0.1937 | 0.2291 | 340 | 1.1878 | 18206936 |
0.1524 | 0.2324 | 345 | 1.1905 | 18474736 |
0.2195 | 0.2358 | 350 | 1.1886 | 18743480 |
0.1622 | 0.2392 | 355 | 1.1857 | 19012232 |
0.2023 | 0.2425 | 360 | 1.1899 | 19276600 |
0.1584 | 0.2459 | 365 | 1.1858 | 19551832 |
0.1618 | 0.2493 | 370 | 1.1871 | 19827848 |
0.0807 | 0.2526 | 375 | 1.1862 | 20094104 |
0.1594 | 0.2560 | 380 | 1.1878 | 20360880 |
0.1828 | 0.2594 | 385 | 1.1860 | 20629120 |
0.237 | 0.2627 | 390 | 1.1832 | 20897088 |
0.1511 | 0.2661 | 395 | 1.1786 | 21163952 |
0.1626 | 0.2695 | 400 | 1.1789 | 21434224 |
0.1721 | 0.2729 | 405 | 1.1772 | 21704128 |
0.1283 | 0.2762 | 410 | 1.1768 | 21973760 |
0.1534 | 0.2796 | 415 | 1.1766 | 22243480 |
0.1461 | 0.2830 | 420 | 1.1790 | 22512848 |
0.0818 | 0.2863 | 425 | 1.1775 | 22775944 |
0.2356 | 0.2897 | 430 | 1.1781 | 23042216 |
0.1638 | 0.2931 | 435 | 1.1765 | 23311872 |
0.1251 | 0.2964 | 440 | 1.1715 | 23583792 |
0.2321 | 0.2998 | 445 | 1.1699 | 23852472 |
0.2209 | 0.3032 | 450 | 1.1726 | 24121504 |
0.1771 | 0.3065 | 455 | 1.1655 | 24388704 |
0.1644 | 0.3099 | 460 | 1.1713 | 24657496 |
0.1551 | 0.3133 | 465 | 1.1684 | 24921776 |
0.1607 | 0.3166 | 470 | 1.1673 | 25186256 |
0.0927 | 0.3200 | 475 | 1.1677 | 25456584 |
0.1523 | 0.3234 | 480 | 1.1645 | 25725384 |
0.1665 | 0.3268 | 485 | 1.1661 | 25997384 |
0.209 | 0.3301 | 490 | 1.1679 | 26258576 |
0.1595 | 0.3335 | 495 | 1.1630 | 26524160 |
0.1801 | 0.3369 | 500 | 1.1627 | 26790776 |
0.1483 | 0.3402 | 505 | 1.1632 | 27054416 |
0.1868 | 0.3436 | 510 | 1.1590 | 27322560 |
0.1118 | 0.3470 | 515 | 1.1624 | 27587584 |
0.1938 | 0.3503 | 520 | 1.1631 | 27851280 |
0.1447 | 0.3537 | 525 | 1.1569 | 28118456 |
0.1759 | 0.3571 | 530 | 1.1613 | 28376264 |
0.207 | 0.3604 | 535 | 1.1612 | 28642960 |
0.1392 | 0.3638 | 540 | 1.1570 | 28912216 |
0.1562 | 0.3672 | 545 | 1.1615 | 29179632 |
0.1561 | 0.3705 | 550 | 1.1592 | 29441400 |
0.1788 | 0.3739 | 555 | 1.1544 | 29704448 |
0.1415 | 0.3773 | 560 | 1.1568 | 29974672 |
0.127 | 0.3806 | 565 | 1.1549 | 30243352 |
0.2107 | 0.3840 | 570 | 1.1510 | 30508072 |
0.1542 | 0.3874 | 575 | 1.1537 | 30775520 |
0.1732 | 0.3908 | 580 | 1.1535 | 31045640 |
0.1384 | 0.3941 | 585 | 1.1523 | 31315088 |
0.1583 | 0.3975 | 590 | 1.1610 | 31584512 |
0.1308 | 0.4009 | 595 | 1.1523 | 31847184 |
0.1238 | 0.4042 | 600 | 1.1490 | 32117736 |
0.1563 | 0.4076 | 605 | 1.1545 | 32387440 |
0.1523 | 0.4110 | 610 | 1.1536 | 32654336 |
0.163 | 0.4143 | 615 | 1.1499 | 32915536 |
0.1577 | 0.4177 | 620 | 1.1522 | 33189624 |
0.1466 | 0.4211 | 625 | 1.1515 | 33456240 |
0.1712 | 0.4244 | 630 | 1.1515 | 33716600 |
0.2426 | 0.4278 | 635 | 1.1503 | 33977128 |
0.1249 | 0.4312 | 640 | 1.1469 | 34244400 |
0.1231 | 0.4345 | 645 | 1.1485 | 34511176 |
0.1404 | 0.4379 | 650 | 1.1521 | 34781096 |
0.1544 | 0.4413 | 655 | 1.1488 | 35042016 |
0.1284 | 0.4447 | 660 | 1.1466 | 35307168 |
0.1682 | 0.4480 | 665 | 1.1485 | 35577240 |
0.1634 | 0.4514 | 670 | 1.1464 | 35842592 |
0.1544 | 0.4548 | 675 | 1.1471 | 36116256 |
0.1744 | 0.4581 | 680 | 1.1474 | 36383960 |
0.1336 | 0.4615 | 685 | 1.1454 | 36650208 |
0.1662 | 0.4649 | 690 | 1.1425 | 36917720 |
0.1898 | 0.4682 | 695 | 1.1435 | 37186632 |
0.0924 | 0.4716 | 700 | 1.1453 | 37448248 |
0.1654 | 0.4750 | 705 | 1.1409 | 37719752 |
0.153 | 0.4783 | 710 | 1.1423 | 37992096 |
0.1598 | 0.4817 | 715 | 1.1431 | 38261872 |
0.1656 | 0.4851 | 720 | 1.1393 | 38522568 |
0.1701 | 0.4884 | 725 | 1.1420 | 38788408 |
0.1177 | 0.4918 | 730 | 1.1427 | 39060104 |
0.2274 | 0.4952 | 735 | 1.1389 | 39322640 |
0.1782 | 0.4985 | 740 | 1.1393 | 39584536 |
0.1255 | 0.5019 | 745 | 1.1410 | 39847800 |
0.136 | 0.5053 | 750 | 1.1396 | 40108760 |
0.1142 | 0.5087 | 755 | 1.1399 | 40370560 |
0.1623 | 0.5120 | 760 | 1.1378 | 40642736 |
0.0867 | 0.5154 | 765 | 1.1428 | 40913808 |
0.0959 | 0.5188 | 770 | 1.1424 | 41170200 |
0.1068 | 0.5221 | 775 | 1.1366 | 41440664 |
0.1886 | 0.5255 | 780 | 1.1397 | 41702144 |
0.2629 | 0.5289 | 785 | 1.1402 | 41964416 |
0.1679 | 0.5322 | 790 | 1.1374 | 42226728 |
0.135 | 0.5356 | 795 | 1.1387 | 42490896 |
0.1195 | 0.5390 | 800 | 1.1387 | 42761760 |
0.1197 | 0.5423 | 805 | 1.1381 | 43022088 |
0.1871 | 0.5457 | 810 | 1.1357 | 43294296 |
0.1538 | 0.5491 | 815 | 1.1367 | 43568856 |
0.1594 | 0.5524 | 820 | 1.1350 | 43837368 |
0.2219 | 0.5558 | 825 | 1.1350 | 44108216 |
0.0933 | 0.5592 | 830 | 1.1324 | 44381984 |
0.1476 | 0.5626 | 835 | 1.1348 | 44641080 |
0.17 | 0.5659 | 840 | 1.1333 | 44912056 |
0.1086 | 0.5693 | 845 | 1.1331 | 45179016 |
0.1518 | 0.5727 | 850 | 1.1348 | 45448544 |
0.0875 | 0.5760 | 855 | 1.1325 | 45711912 |
0.1788 | 0.5794 | 860 | 1.1332 | 45981496 |
0.1095 | 0.5828 | 865 | 1.1351 | 46252912 |
0.1221 | 0.5861 | 870 | 1.1319 | 46513152 |
0.1307 | 0.5895 | 875 | 1.1365 | 46780264 |
0.1303 | 0.5929 | 880 | 1.1379 | 47045720 |
0.1299 | 0.5962 | 885 | 1.1336 | 47312048 |
0.1383 | 0.5996 | 890 | 1.1338 | 47576792 |
0.1171 | 0.6030 | 895 | 1.1335 | 47850232 |
0.1482 | 0.6063 | 900 | 1.1302 | 48126760 |
0.1455 | 0.6097 | 905 | 1.1296 | 48399400 |
0.2177 | 0.6131 | 910 | 1.1315 | 48665784 |
0.1534 | 0.6164 | 915 | 1.1295 | 48926648 |
0.1072 | 0.6198 | 920 | 1.1307 | 49192408 |
0.1584 | 0.6232 | 925 | 1.1325 | 49455304 |
0.1119 | 0.6266 | 930 | 1.1302 | 49725344 |
0.1319 | 0.6299 | 935 | 1.1321 | 49986584 |
0.1688 | 0.6333 | 940 | 1.1305 | 50257056 |
0.1062 | 0.6367 | 945 | 1.1306 | 50529728 |
0.1567 | 0.6400 | 950 | 1.1318 | 50803152 |
0.1352 | 0.6434 | 955 | 1.1283 | 51065616 |
0.146 | 0.6468 | 960 | 1.1307 | 51330472 |
0.2095 | 0.6501 | 965 | 1.1303 | 51597440 |
0.1321 | 0.6535 | 970 | 1.1283 | 51868448 |
0.1608 | 0.6569 | 975 | 1.1287 | 52141152 |
0.1166 | 0.6602 | 980 | 1.1280 | 52406200 |
0.0847 | 0.6636 | 985 | 1.1287 | 52674960 |
0.1894 | 0.6670 | 990 | 1.1318 | 52940264 |
0.169 | 0.6703 | 995 | 1.1276 | 53208664 |
0.1393 | 0.6737 | 1000 | 1.1248 | 53467832 |
0.2606 | 0.6771 | 1005 | 1.1272 | 53739312 |
0.1588 | 0.6804 | 1010 | 1.1292 | 54000424 |
0.148 | 0.6838 | 1015 | 1.1300 | 54272248 |
0.1254 | 0.6872 | 1020 | 1.1290 | 54536880 |
0.169 | 0.6906 | 1025 | 1.1260 | 54811904 |
0.1179 | 0.6939 | 1030 | 1.1262 | 55079712 |
0.1874 | 0.6973 | 1035 | 1.1282 | 55354032 |
0.126 | 0.7007 | 1040 | 1.1280 | 55621064 |
0.1321 | 0.7040 | 1045 | 1.1270 | 55881304 |
0.2101 | 0.7074 | 1050 | 1.1262 | 56150136 |
0.1423 | 0.7108 | 1055 | 1.1258 | 56415536 |
0.0638 | 0.7141 | 1060 | 1.1280 | 56689536 |
0.1545 | 0.7175 | 1065 | 1.1271 | 56956312 |
0.2122 | 0.7209 | 1070 | 1.1226 | 57226432 |
0.096 | 0.7242 | 1075 | 1.1255 | 57491144 |
0.1338 | 0.7276 | 1080 | 1.1278 | 57756376 |
0.188 | 0.7310 | 1085 | 1.1254 | 58022576 |
0.0907 | 0.7343 | 1090 | 1.1252 | 58297376 |
0.1706 | 0.7377 | 1095 | 1.1246 | 58568136 |
0.0927 | 0.7411 | 1100 | 1.1232 | 58836160 |
0.101 | 0.7445 | 1105 | 1.1260 | 59101048 |
0.1179 | 0.7478 | 1110 | 1.1244 | 59364432 |
0.1297 | 0.7512 | 1115 | 1.1246 | 59622392 |
0.1368 | 0.7546 | 1120 | 1.1251 | 59882352 |
0.1019 | 0.7579 | 1125 | 1.1260 | 60150856 |
0.1502 | 0.7613 | 1130 | 1.1270 | 60418952 |
0.1193 | 0.7647 | 1135 | 1.1246 | 60687440 |
0.1362 | 0.7680 | 1140 | 1.1253 | 60959592 |
0.1215 | 0.7714 | 1145 | 1.1249 | 61226264 |
0.1576 | 0.7748 | 1150 | 1.1224 | 61489320 |
0.1421 | 0.7781 | 1155 | 1.1225 | 61757256 |
0.1126 | 0.7815 | 1160 | 1.1227 | 62023952 |
0.1095 | 0.7849 | 1165 | 1.1221 | 62287448 |
0.1321 | 0.7882 | 1170 | 1.1209 | 62554936 |
0.1498 | 0.7916 | 1175 | 1.1216 | 62819376 |
0.1028 | 0.7950 | 1180 | 1.1200 | 63091120 |
0.1178 | 0.7983 | 1185 | 1.1220 | 63368872 |
0.1483 | 0.8017 | 1190 | 1.1215 | 63623128 |
0.1762 | 0.8051 | 1195 | 1.1202 | 63888840 |
0.1158 | 0.8085 | 1200 | 1.1200 | 64158928 |
0.1189 | 0.8118 | 1205 | 1.1206 | 64430672 |
0.1609 | 0.8152 | 1210 | 1.1210 | 64695664 |
0.1314 | 0.8186 | 1215 | 1.1209 | 64958040 |
0.1874 | 0.8219 | 1220 | 1.1222 | 65229264 |
0.0996 | 0.8253 | 1225 | 1.1221 | 65493608 |
0.1751 | 0.8287 | 1230 | 1.1245 | 65763136 |
0.0866 | 0.8320 | 1235 | 1.1233 | 66029016 |
0.0764 | 0.8354 | 1240 | 1.1215 | 66305000 |
0.0992 | 0.8388 | 1245 | 1.1233 | 66573920 |
0.1284 | 0.8421 | 1250 | 1.1238 | 66838376 |
0.2141 | 0.8455 | 1255 | 1.1202 | 67109992 |
0.1298 | 0.8489 | 1260 | 1.1224 | 67378200 |
0.1407 | 0.8522 | 1265 | 1.1230 | 67654376 |
0.1795 | 0.8556 | 1270 | 1.1205 | 67923456 |
0.1223 | 0.8590 | 1275 | 1.1193 | 68195000 |
0.1079 | 0.8624 | 1280 | 1.1211 | 68468936 |
0.1897 | 0.8657 | 1285 | 1.1190 | 68732136 |
0.1639 | 0.8691 | 1290 | 1.1172 | 68996872 |
0.1757 | 0.8725 | 1295 | 1.1205 | 69262240 |
0.1413 | 0.8758 | 1300 | 1.1209 | 69532056 |
0.1653 | 0.8792 | 1305 | 1.1191 | 69793216 |
0.1735 | 0.8826 | 1310 | 1.1178 | 70059864 |
0.1297 | 0.8859 | 1315 | 1.1159 | 70327328 |
0.1241 | 0.8893 | 1320 | 1.1194 | 70596624 |
0.1343 | 0.8927 | 1325 | 1.1209 | 70861864 |
0.1642 | 0.8960 | 1330 | 1.1165 | 71127168 |
0.2015 | 0.8994 | 1335 | 1.1159 | 71397272 |
0.1633 | 0.9028 | 1340 | 1.1173 | 71666120 |
0.1352 | 0.9061 | 1345 | 1.1190 | 71931128 |
0.1171 | 0.9095 | 1350 | 1.1188 | 72189232 |
0.1459 | 0.9129 | 1355 | 1.1189 | 72460496 |
0.0966 | 0.9162 | 1360 | 1.1171 | 72726432 |
0.1652 | 0.9196 | 1365 | 1.1153 | 72986368 |
0.1064 | 0.9230 | 1370 | 1.1157 | 73244768 |
0.0793 | 0.9264 | 1375 | 1.1172 | 73516176 |
0.1431 | 0.9297 | 1380 | 1.1163 | 73787680 |
0.2121 | 0.9331 | 1385 | 1.1146 | 74054312 |
0.1048 | 0.9365 | 1390 | 1.1152 | 74322400 |
0.1429 | 0.9398 | 1395 | 1.1174 | 74592640 |
0.1361 | 0.9432 | 1400 | 1.1146 | 74852400 |
0.1635 | 0.9466 | 1405 | 1.1125 | 75124688 |
0.1062 | 0.9499 | 1410 | 1.1163 | 75393736 |
0.1598 | 0.9533 | 1415 | 1.1164 | 75662256 |
0.1458 | 0.9567 | 1420 | 1.1168 | 75930104 |
0.1704 | 0.9600 | 1425 | 1.1196 | 76196760 |
0.134 | 0.9634 | 1430 | 1.1177 | 76463992 |
0.1194 | 0.9668 | 1435 | 1.1148 | 76725792 |
0.1131 | 0.9701 | 1440 | 1.1156 | 76995592 |
0.1135 | 0.9735 | 1445 | 1.1158 | 77259528 |
0.1403 | 0.9769 | 1450 | 1.1150 | 77520320 |
0.0903 | 0.9803 | 1455 | 1.1131 | 77782712 |
0.1369 | 0.9836 | 1460 | 1.1147 | 78040760 |
0.1089 | 0.9870 | 1465 | 1.1176 | 78324080 |
0.1288 | 0.9904 | 1470 | 1.1166 | 78580464 |
0.1442 | 0.9937 | 1475 | 1.1129 | 78847936 |
0.0878 | 0.9971 | 1480 | 1.1137 | 79114280 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1