--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1163 - Num Input Tokens Seen: 79329904 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6259 | 0.0034 | 5 | 1.3948 | 269192 | | 1.7205 | 0.0067 | 10 | 1.3856 | 535744 | | 1.6747 | 0.0101 | 15 | 1.3559 | 807760 | | 1.6258 | 0.0135 | 20 | 1.3143 | 1073304 | | 1.4352 | 0.0168 | 25 | 1.2657 | 1347048 | | 1.3674 | 0.0202 | 30 | 1.2317 | 1615280 | | 1.3574 | 0.0236 | 35 | 1.1989 | 1885648 | | 1.3486 | 0.0269 | 40 | 1.1784 | 2156656 | | 1.2541 | 0.0303 | 45 | 1.1793 | 2421504 | | 1.1753 | 0.0337 | 50 | 1.1790 | 2687560 | | 1.0309 | 0.0371 | 55 | 1.1886 | 2951648 | | 0.9656 | 0.0404 | 60 | 1.2106 | 3227216 | | 0.8768 | 0.0438 | 65 | 1.2146 | 3497872 | | 0.6429 | 0.0472 | 70 | 1.2774 | 3764712 | | 0.7068 | 0.0505 | 75 | 1.2819 | 4030952 | | 0.6668 | 0.0539 | 80 | 1.2590 | 4301160 | | 0.5531 | 0.0573 | 85 | 1.2669 | 4568312 | | 0.418 | 0.0606 | 90 | 1.2660 | 4834208 | | 0.351 | 0.0640 | 95 | 1.2736 | 5100584 | | 0.4759 | 0.0674 | 100 | 1.2518 | 5363000 | | 0.3357 | 0.0707 | 105 | 1.2374 | 5633512 | | 0.4113 | 0.0741 | 110 | 1.2428 | 5897448 | | 0.3705 | 0.0775 | 115 | 1.2439 | 6165216 | | 0.2944 | 0.0808 | 120 | 1.2403 | 6424248 | | 0.3285 | 0.0842 | 125 | 1.2456 | 6681496 | | 0.3726 | 0.0876 | 130 | 1.2270 | 6952128 | | 0.2679 | 0.0910 | 135 | 1.2238 | 7220384 | | 0.252 | 0.0943 | 140 | 1.2258 | 7487104 | | 0.3314 | 0.0977 | 145 | 1.2162 | 7763376 | | 0.2311 | 0.1011 | 150 | 1.2244 | 8032536 | | 0.3076 | 0.1044 | 155 | 1.2139 | 8303104 | | 0.2792 | 0.1078 | 160 | 1.2208 | 8566312 | | 0.2223 | 0.1112 | 165 | 1.2174 | 8842536 | | 0.2285 | 0.1145 | 170 | 1.2217 | 9101304 | | 0.2407 | 0.1179 | 175 | 1.2076 | 9369776 | | 0.2957 | 0.1213 | 180 | 1.2121 | 9634840 | | 0.3073 | 0.1246 | 185 | 1.2132 | 9899744 | | 0.2155 | 0.1280 | 190 | 1.2084 | 10168352 | | 0.206 | 0.1314 | 195 | 1.2147 | 10440512 | | 0.2419 | 0.1347 | 200 | 1.2024 | 10707544 | | 0.1884 | 0.1381 | 205 | 1.2068 | 10975600 | | 0.1609 | 0.1415 | 210 | 1.2079 | 11242968 | | 0.2126 | 0.1448 | 215 | 1.2093 | 11511848 | | 0.2478 | 0.1482 | 220 | 1.2074 | 11783608 | | 0.2409 | 0.1516 | 225 | 1.2116 | 12054688 | | 0.2337 | 0.1550 | 230 | 1.2080 | 12318472 | | 0.1825 | 0.1583 | 235 | 1.2044 | 12595544 | | 0.2339 | 0.1617 | 240 | 1.2087 | 12861552 | | 0.208 | 0.1651 | 245 | 1.2029 | 13125424 | | 0.17 | 0.1684 | 250 | 1.2086 | 13400288 | | 0.1689 | 0.1718 | 255 | 1.1979 | 13669320 | | 0.2218 | 0.1752 | 260 | 1.1995 | 13933120 | | 0.2991 | 0.1785 | 265 | 1.2025 | 14196528 | | 0.1689 | 0.1819 | 270 | 1.2027 | 14457312 | | 0.1555 | 0.1853 | 275 | 1.2016 | 14729752 | | 0.2401 | 0.1886 | 280 | 1.2039 | 14996792 | | 0.1565 | 0.1920 | 285 | 1.1978 | 15263432 | | 0.1755 | 0.1954 | 290 | 1.1972 | 15524472 | | 0.2002 | 0.1987 | 295 | 1.1961 | 15796048 | | 0.1917 | 0.2021 | 300 | 1.1959 | 16062768 | | 0.1862 | 0.2055 | 305 | 1.2027 | 16340792 | | 0.2141 | 0.2089 | 310 | 1.1931 | 16617552 | | 0.214 | 0.2122 | 315 | 1.1955 | 16882528 | | 0.1397 | 0.2156 | 320 | 1.2007 | 17143744 | | 0.2168 | 0.2190 | 325 | 1.1883 | 17411360 | | 0.118 | 0.2223 | 330 | 1.1881 | 17675528 | | 0.1473 | 0.2257 | 335 | 1.2000 | 17939752 | | 0.1937 | 0.2291 | 340 | 1.1878 | 18206936 | | 0.1524 | 0.2324 | 345 | 1.1905 | 18474736 | | 0.2195 | 0.2358 | 350 | 1.1886 | 18743480 | | 0.1622 | 0.2392 | 355 | 1.1857 | 19012232 | | 0.2023 | 0.2425 | 360 | 1.1899 | 19276600 | | 0.1584 | 0.2459 | 365 | 1.1858 | 19551832 | | 0.1618 | 0.2493 | 370 | 1.1871 | 19827848 | | 0.0807 | 0.2526 | 375 | 1.1862 | 20094104 | | 0.1594 | 0.2560 | 380 | 1.1878 | 20360880 | | 0.1828 | 0.2594 | 385 | 1.1860 | 20629120 | | 0.237 | 0.2627 | 390 | 1.1832 | 20897088 | | 0.1511 | 0.2661 | 395 | 1.1786 | 21163952 | | 0.1626 | 0.2695 | 400 | 1.1789 | 21434224 | | 0.1721 | 0.2729 | 405 | 1.1772 | 21704128 | | 0.1283 | 0.2762 | 410 | 1.1768 | 21973760 | | 0.1534 | 0.2796 | 415 | 1.1766 | 22243480 | | 0.1461 | 0.2830 | 420 | 1.1790 | 22512848 | | 0.0818 | 0.2863 | 425 | 1.1775 | 22775944 | | 0.2356 | 0.2897 | 430 | 1.1781 | 23042216 | | 0.1638 | 0.2931 | 435 | 1.1765 | 23311872 | | 0.1251 | 0.2964 | 440 | 1.1715 | 23583792 | | 0.2321 | 0.2998 | 445 | 1.1699 | 23852472 | | 0.2209 | 0.3032 | 450 | 1.1726 | 24121504 | | 0.1771 | 0.3065 | 455 | 1.1655 | 24388704 | | 0.1644 | 0.3099 | 460 | 1.1713 | 24657496 | | 0.1551 | 0.3133 | 465 | 1.1684 | 24921776 | | 0.1607 | 0.3166 | 470 | 1.1673 | 25186256 | | 0.0927 | 0.3200 | 475 | 1.1677 | 25456584 | | 0.1523 | 0.3234 | 480 | 1.1645 | 25725384 | | 0.1665 | 0.3268 | 485 | 1.1661 | 25997384 | | 0.209 | 0.3301 | 490 | 1.1679 | 26258576 | | 0.1595 | 0.3335 | 495 | 1.1630 | 26524160 | | 0.1801 | 0.3369 | 500 | 1.1627 | 26790776 | | 0.1483 | 0.3402 | 505 | 1.1632 | 27054416 | | 0.1868 | 0.3436 | 510 | 1.1590 | 27322560 | | 0.1118 | 0.3470 | 515 | 1.1624 | 27587584 | | 0.1938 | 0.3503 | 520 | 1.1631 | 27851280 | | 0.1447 | 0.3537 | 525 | 1.1569 | 28118456 | | 0.1759 | 0.3571 | 530 | 1.1613 | 28376264 | | 0.207 | 0.3604 | 535 | 1.1612 | 28642960 | | 0.1392 | 0.3638 | 540 | 1.1570 | 28912216 | | 0.1562 | 0.3672 | 545 | 1.1615 | 29179632 | | 0.1561 | 0.3705 | 550 | 1.1592 | 29441400 | | 0.1788 | 0.3739 | 555 | 1.1544 | 29704448 | | 0.1415 | 0.3773 | 560 | 1.1568 | 29974672 | | 0.127 | 0.3806 | 565 | 1.1549 | 30243352 | | 0.2107 | 0.3840 | 570 | 1.1510 | 30508072 | | 0.1542 | 0.3874 | 575 | 1.1537 | 30775520 | | 0.1732 | 0.3908 | 580 | 1.1535 | 31045640 | | 0.1384 | 0.3941 | 585 | 1.1523 | 31315088 | | 0.1583 | 0.3975 | 590 | 1.1610 | 31584512 | | 0.1308 | 0.4009 | 595 | 1.1523 | 31847184 | | 0.1238 | 0.4042 | 600 | 1.1490 | 32117736 | | 0.1563 | 0.4076 | 605 | 1.1545 | 32387440 | | 0.1523 | 0.4110 | 610 | 1.1536 | 32654336 | | 0.163 | 0.4143 | 615 | 1.1499 | 32915536 | | 0.1577 | 0.4177 | 620 | 1.1522 | 33189624 | | 0.1466 | 0.4211 | 625 | 1.1515 | 33456240 | | 0.1712 | 0.4244 | 630 | 1.1515 | 33716600 | | 0.2426 | 0.4278 | 635 | 1.1503 | 33977128 | | 0.1249 | 0.4312 | 640 | 1.1469 | 34244400 | | 0.1231 | 0.4345 | 645 | 1.1485 | 34511176 | | 0.1404 | 0.4379 | 650 | 1.1521 | 34781096 | | 0.1544 | 0.4413 | 655 | 1.1488 | 35042016 | | 0.1284 | 0.4447 | 660 | 1.1466 | 35307168 | | 0.1682 | 0.4480 | 665 | 1.1485 | 35577240 | | 0.1634 | 0.4514 | 670 | 1.1464 | 35842592 | | 0.1544 | 0.4548 | 675 | 1.1471 | 36116256 | | 0.1744 | 0.4581 | 680 | 1.1474 | 36383960 | | 0.1336 | 0.4615 | 685 | 1.1454 | 36650208 | | 0.1662 | 0.4649 | 690 | 1.1425 | 36917720 | | 0.1898 | 0.4682 | 695 | 1.1435 | 37186632 | | 0.0924 | 0.4716 | 700 | 1.1453 | 37448248 | | 0.1654 | 0.4750 | 705 | 1.1409 | 37719752 | | 0.153 | 0.4783 | 710 | 1.1423 | 37992096 | | 0.1598 | 0.4817 | 715 | 1.1431 | 38261872 | | 0.1656 | 0.4851 | 720 | 1.1393 | 38522568 | | 0.1701 | 0.4884 | 725 | 1.1420 | 38788408 | | 0.1177 | 0.4918 | 730 | 1.1427 | 39060104 | | 0.2274 | 0.4952 | 735 | 1.1389 | 39322640 | | 0.1782 | 0.4985 | 740 | 1.1393 | 39584536 | | 0.1255 | 0.5019 | 745 | 1.1410 | 39847800 | | 0.136 | 0.5053 | 750 | 1.1396 | 40108760 | | 0.1142 | 0.5087 | 755 | 1.1399 | 40370560 | | 0.1623 | 0.5120 | 760 | 1.1378 | 40642736 | | 0.0867 | 0.5154 | 765 | 1.1428 | 40913808 | | 0.0959 | 0.5188 | 770 | 1.1424 | 41170200 | | 0.1068 | 0.5221 | 775 | 1.1366 | 41440664 | | 0.1886 | 0.5255 | 780 | 1.1397 | 41702144 | | 0.2629 | 0.5289 | 785 | 1.1402 | 41964416 | | 0.1679 | 0.5322 | 790 | 1.1374 | 42226728 | | 0.135 | 0.5356 | 795 | 1.1387 | 42490896 | | 0.1195 | 0.5390 | 800 | 1.1387 | 42761760 | | 0.1197 | 0.5423 | 805 | 1.1381 | 43022088 | | 0.1871 | 0.5457 | 810 | 1.1357 | 43294296 | | 0.1538 | 0.5491 | 815 | 1.1367 | 43568856 | | 0.1594 | 0.5524 | 820 | 1.1350 | 43837368 | | 0.2219 | 0.5558 | 825 | 1.1350 | 44108216 | | 0.0933 | 0.5592 | 830 | 1.1324 | 44381984 | | 0.1476 | 0.5626 | 835 | 1.1348 | 44641080 | | 0.17 | 0.5659 | 840 | 1.1333 | 44912056 | | 0.1086 | 0.5693 | 845 | 1.1331 | 45179016 | | 0.1518 | 0.5727 | 850 | 1.1348 | 45448544 | | 0.0875 | 0.5760 | 855 | 1.1325 | 45711912 | | 0.1788 | 0.5794 | 860 | 1.1332 | 45981496 | | 0.1095 | 0.5828 | 865 | 1.1351 | 46252912 | | 0.1221 | 0.5861 | 870 | 1.1319 | 46513152 | | 0.1307 | 0.5895 | 875 | 1.1365 | 46780264 | | 0.1303 | 0.5929 | 880 | 1.1379 | 47045720 | | 0.1299 | 0.5962 | 885 | 1.1336 | 47312048 | | 0.1383 | 0.5996 | 890 | 1.1338 | 47576792 | | 0.1171 | 0.6030 | 895 | 1.1335 | 47850232 | | 0.1482 | 0.6063 | 900 | 1.1302 | 48126760 | | 0.1455 | 0.6097 | 905 | 1.1296 | 48399400 | | 0.2177 | 0.6131 | 910 | 1.1315 | 48665784 | | 0.1534 | 0.6164 | 915 | 1.1295 | 48926648 | | 0.1072 | 0.6198 | 920 | 1.1307 | 49192408 | | 0.1584 | 0.6232 | 925 | 1.1325 | 49455304 | | 0.1119 | 0.6266 | 930 | 1.1302 | 49725344 | | 0.1319 | 0.6299 | 935 | 1.1321 | 49986584 | | 0.1688 | 0.6333 | 940 | 1.1305 | 50257056 | | 0.1062 | 0.6367 | 945 | 1.1306 | 50529728 | | 0.1567 | 0.6400 | 950 | 1.1318 | 50803152 | | 0.1352 | 0.6434 | 955 | 1.1283 | 51065616 | | 0.146 | 0.6468 | 960 | 1.1307 | 51330472 | | 0.2095 | 0.6501 | 965 | 1.1303 | 51597440 | | 0.1321 | 0.6535 | 970 | 1.1283 | 51868448 | | 0.1608 | 0.6569 | 975 | 1.1287 | 52141152 | | 0.1166 | 0.6602 | 980 | 1.1280 | 52406200 | | 0.0847 | 0.6636 | 985 | 1.1287 | 52674960 | | 0.1894 | 0.6670 | 990 | 1.1318 | 52940264 | | 0.169 | 0.6703 | 995 | 1.1276 | 53208664 | | 0.1393 | 0.6737 | 1000 | 1.1248 | 53467832 | | 0.2606 | 0.6771 | 1005 | 1.1272 | 53739312 | | 0.1588 | 0.6804 | 1010 | 1.1292 | 54000424 | | 0.148 | 0.6838 | 1015 | 1.1300 | 54272248 | | 0.1254 | 0.6872 | 1020 | 1.1290 | 54536880 | | 0.169 | 0.6906 | 1025 | 1.1260 | 54811904 | | 0.1179 | 0.6939 | 1030 | 1.1262 | 55079712 | | 0.1874 | 0.6973 | 1035 | 1.1282 | 55354032 | | 0.126 | 0.7007 | 1040 | 1.1280 | 55621064 | | 0.1321 | 0.7040 | 1045 | 1.1270 | 55881304 | | 0.2101 | 0.7074 | 1050 | 1.1262 | 56150136 | | 0.1423 | 0.7108 | 1055 | 1.1258 | 56415536 | | 0.0638 | 0.7141 | 1060 | 1.1280 | 56689536 | | 0.1545 | 0.7175 | 1065 | 1.1271 | 56956312 | | 0.2122 | 0.7209 | 1070 | 1.1226 | 57226432 | | 0.096 | 0.7242 | 1075 | 1.1255 | 57491144 | | 0.1338 | 0.7276 | 1080 | 1.1278 | 57756376 | | 0.188 | 0.7310 | 1085 | 1.1254 | 58022576 | | 0.0907 | 0.7343 | 1090 | 1.1252 | 58297376 | | 0.1706 | 0.7377 | 1095 | 1.1246 | 58568136 | | 0.0927 | 0.7411 | 1100 | 1.1232 | 58836160 | | 0.101 | 0.7445 | 1105 | 1.1260 | 59101048 | | 0.1179 | 0.7478 | 1110 | 1.1244 | 59364432 | | 0.1297 | 0.7512 | 1115 | 1.1246 | 59622392 | | 0.1368 | 0.7546 | 1120 | 1.1251 | 59882352 | | 0.1019 | 0.7579 | 1125 | 1.1260 | 60150856 | | 0.1502 | 0.7613 | 1130 | 1.1270 | 60418952 | | 0.1193 | 0.7647 | 1135 | 1.1246 | 60687440 | | 0.1362 | 0.7680 | 1140 | 1.1253 | 60959592 | | 0.1215 | 0.7714 | 1145 | 1.1249 | 61226264 | | 0.1576 | 0.7748 | 1150 | 1.1224 | 61489320 | | 0.1421 | 0.7781 | 1155 | 1.1225 | 61757256 | | 0.1126 | 0.7815 | 1160 | 1.1227 | 62023952 | | 0.1095 | 0.7849 | 1165 | 1.1221 | 62287448 | | 0.1321 | 0.7882 | 1170 | 1.1209 | 62554936 | | 0.1498 | 0.7916 | 1175 | 1.1216 | 62819376 | | 0.1028 | 0.7950 | 1180 | 1.1200 | 63091120 | | 0.1178 | 0.7983 | 1185 | 1.1220 | 63368872 | | 0.1483 | 0.8017 | 1190 | 1.1215 | 63623128 | | 0.1762 | 0.8051 | 1195 | 1.1202 | 63888840 | | 0.1158 | 0.8085 | 1200 | 1.1200 | 64158928 | | 0.1189 | 0.8118 | 1205 | 1.1206 | 64430672 | | 0.1609 | 0.8152 | 1210 | 1.1210 | 64695664 | | 0.1314 | 0.8186 | 1215 | 1.1209 | 64958040 | | 0.1874 | 0.8219 | 1220 | 1.1222 | 65229264 | | 0.0996 | 0.8253 | 1225 | 1.1221 | 65493608 | | 0.1751 | 0.8287 | 1230 | 1.1245 | 65763136 | | 0.0866 | 0.8320 | 1235 | 1.1233 | 66029016 | | 0.0764 | 0.8354 | 1240 | 1.1215 | 66305000 | | 0.0992 | 0.8388 | 1245 | 1.1233 | 66573920 | | 0.1284 | 0.8421 | 1250 | 1.1238 | 66838376 | | 0.2141 | 0.8455 | 1255 | 1.1202 | 67109992 | | 0.1298 | 0.8489 | 1260 | 1.1224 | 67378200 | | 0.1407 | 0.8522 | 1265 | 1.1230 | 67654376 | | 0.1795 | 0.8556 | 1270 | 1.1205 | 67923456 | | 0.1223 | 0.8590 | 1275 | 1.1193 | 68195000 | | 0.1079 | 0.8624 | 1280 | 1.1211 | 68468936 | | 0.1897 | 0.8657 | 1285 | 1.1190 | 68732136 | | 0.1639 | 0.8691 | 1290 | 1.1172 | 68996872 | | 0.1757 | 0.8725 | 1295 | 1.1205 | 69262240 | | 0.1413 | 0.8758 | 1300 | 1.1209 | 69532056 | | 0.1653 | 0.8792 | 1305 | 1.1191 | 69793216 | | 0.1735 | 0.8826 | 1310 | 1.1178 | 70059864 | | 0.1297 | 0.8859 | 1315 | 1.1159 | 70327328 | | 0.1241 | 0.8893 | 1320 | 1.1194 | 70596624 | | 0.1343 | 0.8927 | 1325 | 1.1209 | 70861864 | | 0.1642 | 0.8960 | 1330 | 1.1165 | 71127168 | | 0.2015 | 0.8994 | 1335 | 1.1159 | 71397272 | | 0.1633 | 0.9028 | 1340 | 1.1173 | 71666120 | | 0.1352 | 0.9061 | 1345 | 1.1190 | 71931128 | | 0.1171 | 0.9095 | 1350 | 1.1188 | 72189232 | | 0.1459 | 0.9129 | 1355 | 1.1189 | 72460496 | | 0.0966 | 0.9162 | 1360 | 1.1171 | 72726432 | | 0.1652 | 0.9196 | 1365 | 1.1153 | 72986368 | | 0.1064 | 0.9230 | 1370 | 1.1157 | 73244768 | | 0.0793 | 0.9264 | 1375 | 1.1172 | 73516176 | | 0.1431 | 0.9297 | 1380 | 1.1163 | 73787680 | | 0.2121 | 0.9331 | 1385 | 1.1146 | 74054312 | | 0.1048 | 0.9365 | 1390 | 1.1152 | 74322400 | | 0.1429 | 0.9398 | 1395 | 1.1174 | 74592640 | | 0.1361 | 0.9432 | 1400 | 1.1146 | 74852400 | | 0.1635 | 0.9466 | 1405 | 1.1125 | 75124688 | | 0.1062 | 0.9499 | 1410 | 1.1163 | 75393736 | | 0.1598 | 0.9533 | 1415 | 1.1164 | 75662256 | | 0.1458 | 0.9567 | 1420 | 1.1168 | 75930104 | | 0.1704 | 0.9600 | 1425 | 1.1196 | 76196760 | | 0.134 | 0.9634 | 1430 | 1.1177 | 76463992 | | 0.1194 | 0.9668 | 1435 | 1.1148 | 76725792 | | 0.1131 | 0.9701 | 1440 | 1.1156 | 76995592 | | 0.1135 | 0.9735 | 1445 | 1.1158 | 77259528 | | 0.1403 | 0.9769 | 1450 | 1.1150 | 77520320 | | 0.0903 | 0.9803 | 1455 | 1.1131 | 77782712 | | 0.1369 | 0.9836 | 1460 | 1.1147 | 78040760 | | 0.1089 | 0.9870 | 1465 | 1.1176 | 78324080 | | 0.1288 | 0.9904 | 1470 | 1.1166 | 78580464 | | 0.1442 | 0.9937 | 1475 | 1.1129 | 78847936 | | 0.0878 | 0.9971 | 1480 | 1.1137 | 79114280 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1