collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1167
- Num Input Tokens Seen: 79914736
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6969 | 0.0034 | 5 | 1.3951 | 274520 |
1.7247 | 0.0067 | 10 | 1.3855 | 536248 |
1.6971 | 0.0101 | 15 | 1.3559 | 794656 |
1.5505 | 0.0134 | 20 | 1.3144 | 1057184 |
1.4888 | 0.0168 | 25 | 1.2676 | 1325376 |
1.4116 | 0.0201 | 30 | 1.2360 | 1590216 |
1.3821 | 0.0235 | 35 | 1.2014 | 1856792 |
1.2761 | 0.0269 | 40 | 1.1797 | 2122920 |
1.2308 | 0.0302 | 45 | 1.1840 | 2389448 |
1.0825 | 0.0336 | 50 | 1.1769 | 2656888 |
0.9702 | 0.0369 | 55 | 1.1934 | 2929040 |
0.9769 | 0.0403 | 60 | 1.2092 | 3198992 |
0.9024 | 0.0436 | 65 | 1.2254 | 3464528 |
0.7529 | 0.0470 | 70 | 1.2581 | 3739016 |
0.669 | 0.0503 | 75 | 1.2635 | 4008712 |
0.5191 | 0.0537 | 80 | 1.2575 | 4274960 |
0.542 | 0.0571 | 85 | 1.2787 | 4543312 |
0.4904 | 0.0604 | 90 | 1.2413 | 4812616 |
0.5359 | 0.0638 | 95 | 1.2510 | 5084096 |
0.4912 | 0.0671 | 100 | 1.2598 | 5346872 |
0.5011 | 0.0705 | 105 | 1.2444 | 5618744 |
0.3716 | 0.0738 | 110 | 1.2566 | 5887456 |
0.3875 | 0.0772 | 115 | 1.2413 | 6147240 |
0.3736 | 0.0806 | 120 | 1.2509 | 6419136 |
0.3425 | 0.0839 | 125 | 1.2330 | 6688224 |
0.3669 | 0.0873 | 130 | 1.2457 | 6960624 |
0.3833 | 0.0906 | 135 | 1.2355 | 7230360 |
0.3502 | 0.0940 | 140 | 1.2348 | 7495960 |
0.2222 | 0.0973 | 145 | 1.2303 | 7759832 |
0.306 | 0.1007 | 150 | 1.2309 | 8023008 |
0.2853 | 0.1040 | 155 | 1.2251 | 8291856 |
0.2142 | 0.1074 | 160 | 1.2278 | 8558584 |
0.2133 | 0.1108 | 165 | 1.2302 | 8826152 |
0.2138 | 0.1141 | 170 | 1.2144 | 9085872 |
0.2081 | 0.1175 | 175 | 1.2229 | 9356664 |
0.2293 | 0.1208 | 180 | 1.2104 | 9620688 |
0.2127 | 0.1242 | 185 | 1.2258 | 9887336 |
0.3055 | 0.1275 | 190 | 1.2244 | 10160352 |
0.2662 | 0.1309 | 195 | 1.2163 | 10426336 |
0.2016 | 0.1343 | 200 | 1.2154 | 10691600 |
0.2648 | 0.1376 | 205 | 1.2139 | 10953744 |
0.2557 | 0.1410 | 210 | 1.2189 | 11225024 |
0.1682 | 0.1443 | 215 | 1.2162 | 11482352 |
0.2824 | 0.1477 | 220 | 1.2151 | 11751184 |
0.2768 | 0.1510 | 225 | 1.2059 | 12022384 |
0.2023 | 0.1544 | 230 | 1.2179 | 12292184 |
0.1284 | 0.1577 | 235 | 1.2030 | 12551920 |
0.206 | 0.1611 | 240 | 1.2042 | 12817864 |
0.2329 | 0.1645 | 245 | 1.2060 | 13086560 |
0.1701 | 0.1678 | 250 | 1.2076 | 13363296 |
0.185 | 0.1712 | 255 | 1.2069 | 13622392 |
0.2238 | 0.1745 | 260 | 1.2045 | 13892440 |
0.2461 | 0.1779 | 265 | 1.2114 | 14170792 |
0.1719 | 0.1812 | 270 | 1.2012 | 14437568 |
0.2358 | 0.1846 | 275 | 1.2021 | 14705848 |
0.2734 | 0.1880 | 280 | 1.1977 | 14971224 |
0.2401 | 0.1913 | 285 | 1.2009 | 15234936 |
0.1394 | 0.1947 | 290 | 1.2031 | 15503568 |
0.1886 | 0.1980 | 295 | 1.2006 | 15770672 |
0.2118 | 0.2014 | 300 | 1.1991 | 16040928 |
0.2086 | 0.2047 | 305 | 1.2006 | 16307384 |
0.2199 | 0.2081 | 310 | 1.2041 | 16571672 |
0.1578 | 0.2114 | 315 | 1.1985 | 16840096 |
0.202 | 0.2148 | 320 | 1.2014 | 17107208 |
0.1779 | 0.2182 | 325 | 1.1942 | 17370312 |
0.154 | 0.2215 | 330 | 1.2002 | 17645720 |
0.2082 | 0.2249 | 335 | 1.1903 | 17918096 |
0.1433 | 0.2282 | 340 | 1.1903 | 18177488 |
0.1186 | 0.2316 | 345 | 1.1961 | 18442792 |
0.152 | 0.2349 | 350 | 1.1864 | 18713744 |
0.181 | 0.2383 | 355 | 1.1932 | 18980280 |
0.1509 | 0.2417 | 360 | 1.1896 | 19243912 |
0.1385 | 0.2450 | 365 | 1.1840 | 19498304 |
0.1692 | 0.2484 | 370 | 1.1834 | 19769968 |
0.2464 | 0.2517 | 375 | 1.1833 | 20037160 |
0.1261 | 0.2551 | 380 | 1.1865 | 20301888 |
0.1861 | 0.2584 | 385 | 1.1814 | 20573456 |
0.0906 | 0.2618 | 390 | 1.1923 | 20843208 |
0.0978 | 0.2651 | 395 | 1.1946 | 21114608 |
0.1543 | 0.2685 | 400 | 1.1790 | 21383608 |
0.226 | 0.2719 | 405 | 1.1823 | 21652128 |
0.22 | 0.2752 | 410 | 1.1884 | 21917384 |
0.1618 | 0.2786 | 415 | 1.1734 | 22186024 |
0.171 | 0.2819 | 420 | 1.1758 | 22452312 |
0.1548 | 0.2853 | 425 | 1.1857 | 22725832 |
0.167 | 0.2886 | 430 | 1.1821 | 22996512 |
0.1683 | 0.2920 | 435 | 1.1751 | 23267896 |
0.1115 | 0.2954 | 440 | 1.1812 | 23533192 |
0.2112 | 0.2987 | 445 | 1.1767 | 23802024 |
0.1973 | 0.3021 | 450 | 1.1709 | 24068536 |
0.1267 | 0.3054 | 455 | 1.1758 | 24338256 |
0.1875 | 0.3088 | 460 | 1.1755 | 24598888 |
0.193 | 0.3121 | 465 | 1.1733 | 24871376 |
0.1864 | 0.3155 | 470 | 1.1773 | 25141776 |
0.1256 | 0.3188 | 475 | 1.1738 | 25415760 |
0.1698 | 0.3222 | 480 | 1.1734 | 25683584 |
0.1905 | 0.3256 | 485 | 1.1721 | 25956480 |
0.1276 | 0.3289 | 490 | 1.1701 | 26224168 |
0.1634 | 0.3323 | 495 | 1.1690 | 26501456 |
0.1202 | 0.3356 | 500 | 1.1681 | 26775344 |
0.1286 | 0.3390 | 505 | 1.1705 | 27041632 |
0.17 | 0.3423 | 510 | 1.1692 | 27312216 |
0.1679 | 0.3457 | 515 | 1.1643 | 27584272 |
0.1749 | 0.3491 | 520 | 1.1695 | 27854528 |
0.1164 | 0.3524 | 525 | 1.1676 | 28117224 |
0.1919 | 0.3558 | 530 | 1.1632 | 28388312 |
0.1876 | 0.3591 | 535 | 1.1646 | 28657392 |
0.198 | 0.3625 | 540 | 1.1626 | 28925760 |
0.1769 | 0.3658 | 545 | 1.1616 | 29189176 |
0.1345 | 0.3692 | 550 | 1.1589 | 29452488 |
0.2572 | 0.3725 | 555 | 1.1598 | 29722936 |
0.1343 | 0.3759 | 560 | 1.1632 | 29998000 |
0.2109 | 0.3793 | 565 | 1.1627 | 30271904 |
0.1888 | 0.3826 | 570 | 1.1565 | 30533112 |
0.0746 | 0.3860 | 575 | 1.1581 | 30800688 |
0.2233 | 0.3893 | 580 | 1.1595 | 31064368 |
0.148 | 0.3927 | 585 | 1.1535 | 31333112 |
0.125 | 0.3960 | 590 | 1.1546 | 31609328 |
0.1426 | 0.3994 | 595 | 1.1606 | 31879224 |
0.1265 | 0.4028 | 600 | 1.1567 | 32143080 |
0.1714 | 0.4061 | 605 | 1.1532 | 32411296 |
0.137 | 0.4095 | 610 | 1.1551 | 32686376 |
0.2077 | 0.4128 | 615 | 1.1533 | 32952968 |
0.1127 | 0.4162 | 620 | 1.1546 | 33210696 |
0.1404 | 0.4195 | 625 | 1.1564 | 33475464 |
0.1387 | 0.4229 | 630 | 1.1538 | 33741256 |
0.1381 | 0.4262 | 635 | 1.1535 | 34008496 |
0.142 | 0.4296 | 640 | 1.1552 | 34275232 |
0.1087 | 0.4330 | 645 | 1.1530 | 34544824 |
0.1969 | 0.4363 | 650 | 1.1497 | 34813592 |
0.1426 | 0.4397 | 655 | 1.1510 | 35080320 |
0.1643 | 0.4430 | 660 | 1.1498 | 35341616 |
0.1242 | 0.4464 | 665 | 1.1492 | 35608032 |
0.1364 | 0.4497 | 670 | 1.1513 | 35880072 |
0.1892 | 0.4531 | 675 | 1.1489 | 36142592 |
0.0953 | 0.4565 | 680 | 1.1466 | 36410560 |
0.2022 | 0.4598 | 685 | 1.1493 | 36683312 |
0.1351 | 0.4632 | 690 | 1.1482 | 36949920 |
0.1406 | 0.4665 | 695 | 1.1462 | 37216136 |
0.1169 | 0.4699 | 700 | 1.1455 | 37489968 |
0.078 | 0.4732 | 705 | 1.1503 | 37758560 |
0.1696 | 0.4766 | 710 | 1.1492 | 38030016 |
0.1178 | 0.4799 | 715 | 1.1484 | 38300112 |
0.143 | 0.4833 | 720 | 1.1464 | 38568368 |
0.1707 | 0.4867 | 725 | 1.1465 | 38837120 |
0.0681 | 0.4900 | 730 | 1.1481 | 39109088 |
0.153 | 0.4934 | 735 | 1.1460 | 39373168 |
0.1103 | 0.4967 | 740 | 1.1425 | 39640360 |
0.0952 | 0.5001 | 745 | 1.1497 | 39901928 |
0.1689 | 0.5034 | 750 | 1.1488 | 40176072 |
0.1765 | 0.5068 | 755 | 1.1430 | 40447264 |
0.1576 | 0.5102 | 760 | 1.1445 | 40711480 |
0.1307 | 0.5135 | 765 | 1.1438 | 40978888 |
0.156 | 0.5169 | 770 | 1.1438 | 41244560 |
0.1372 | 0.5202 | 775 | 1.1439 | 41515032 |
0.1315 | 0.5236 | 780 | 1.1407 | 41779296 |
0.1256 | 0.5269 | 785 | 1.1418 | 42049072 |
0.1056 | 0.5303 | 790 | 1.1412 | 42319328 |
0.1188 | 0.5336 | 795 | 1.1427 | 42589408 |
0.1588 | 0.5370 | 800 | 1.1454 | 42861488 |
0.1134 | 0.5404 | 805 | 1.1456 | 43131600 |
0.0708 | 0.5437 | 810 | 1.1425 | 43399240 |
0.1954 | 0.5471 | 815 | 1.1385 | 43666360 |
0.1812 | 0.5504 | 820 | 1.1416 | 43936120 |
0.1669 | 0.5538 | 825 | 1.1404 | 44205176 |
0.1538 | 0.5571 | 830 | 1.1405 | 44476200 |
0.1614 | 0.5605 | 835 | 1.1399 | 44738808 |
0.1507 | 0.5639 | 840 | 1.1402 | 45010280 |
0.1155 | 0.5672 | 845 | 1.1398 | 45277752 |
0.1539 | 0.5706 | 850 | 1.1367 | 45541240 |
0.2072 | 0.5739 | 855 | 1.1353 | 45818304 |
0.1064 | 0.5773 | 860 | 1.1355 | 46086160 |
0.1371 | 0.5806 | 865 | 1.1371 | 46352824 |
0.1804 | 0.5840 | 870 | 1.1378 | 46625200 |
0.1438 | 0.5873 | 875 | 1.1393 | 46895288 |
0.1544 | 0.5907 | 880 | 1.1387 | 47159272 |
0.1394 | 0.5941 | 885 | 1.1375 | 47424840 |
0.1526 | 0.5974 | 890 | 1.1366 | 47684304 |
0.1577 | 0.6008 | 895 | 1.1334 | 47958144 |
0.1263 | 0.6041 | 900 | 1.1349 | 48223704 |
0.1499 | 0.6075 | 905 | 1.1385 | 48487392 |
0.1163 | 0.6108 | 910 | 1.1373 | 48758960 |
0.1612 | 0.6142 | 915 | 1.1354 | 49029832 |
0.1178 | 0.6176 | 920 | 1.1364 | 49300896 |
0.127 | 0.6209 | 925 | 1.1361 | 49566832 |
0.1553 | 0.6243 | 930 | 1.1344 | 49837136 |
0.1389 | 0.6276 | 935 | 1.1343 | 50107024 |
0.1347 | 0.6310 | 940 | 1.1368 | 50383280 |
0.1381 | 0.6343 | 945 | 1.1343 | 50650792 |
0.1109 | 0.6377 | 950 | 1.1324 | 50926640 |
0.155 | 0.6410 | 955 | 1.1328 | 51196952 |
0.114 | 0.6444 | 960 | 1.1345 | 51467480 |
0.1161 | 0.6478 | 965 | 1.1350 | 51737648 |
0.1072 | 0.6511 | 970 | 1.1311 | 52014736 |
0.1533 | 0.6545 | 975 | 1.1294 | 52282944 |
0.1592 | 0.6578 | 980 | 1.1301 | 52554656 |
0.1343 | 0.6612 | 985 | 1.1337 | 52820920 |
0.1682 | 0.6645 | 990 | 1.1305 | 53083288 |
0.139 | 0.6679 | 995 | 1.1307 | 53347344 |
0.127 | 0.6713 | 1000 | 1.1312 | 53615128 |
0.1377 | 0.6746 | 1005 | 1.1302 | 53879272 |
0.1313 | 0.6780 | 1010 | 1.1314 | 54147360 |
0.1482 | 0.6813 | 1015 | 1.1326 | 54419288 |
0.1676 | 0.6847 | 1020 | 1.1297 | 54693584 |
0.1134 | 0.6880 | 1025 | 1.1289 | 54969304 |
0.1362 | 0.6914 | 1030 | 1.1316 | 55236088 |
0.2042 | 0.6947 | 1035 | 1.1298 | 55502680 |
0.0971 | 0.6981 | 1040 | 1.1262 | 55768888 |
0.1224 | 0.7015 | 1045 | 1.1295 | 56038296 |
0.1185 | 0.7048 | 1050 | 1.1338 | 56304576 |
0.1152 | 0.7082 | 1055 | 1.1291 | 56576800 |
0.104 | 0.7115 | 1060 | 1.1261 | 56842256 |
0.1305 | 0.7149 | 1065 | 1.1266 | 57111664 |
0.1256 | 0.7182 | 1070 | 1.1282 | 57384072 |
0.1328 | 0.7216 | 1075 | 1.1280 | 57655856 |
0.0891 | 0.7250 | 1080 | 1.1263 | 57925792 |
0.0878 | 0.7283 | 1085 | 1.1266 | 58195608 |
0.2146 | 0.7317 | 1090 | 1.1283 | 58465256 |
0.1634 | 0.7350 | 1095 | 1.1258 | 58726016 |
0.159 | 0.7384 | 1100 | 1.1239 | 58999152 |
0.165 | 0.7417 | 1105 | 1.1266 | 59275840 |
0.1169 | 0.7451 | 1110 | 1.1286 | 59542128 |
0.1665 | 0.7484 | 1115 | 1.1269 | 59815952 |
0.1763 | 0.7518 | 1120 | 1.1261 | 60092208 |
0.1704 | 0.7552 | 1125 | 1.1268 | 60357632 |
0.1197 | 0.7585 | 1130 | 1.1263 | 60617328 |
0.0815 | 0.7619 | 1135 | 1.1244 | 60883240 |
0.1476 | 0.7652 | 1140 | 1.1241 | 61151384 |
0.0906 | 0.7686 | 1145 | 1.1256 | 61418744 |
0.1546 | 0.7719 | 1150 | 1.1247 | 61695424 |
0.1095 | 0.7753 | 1155 | 1.1254 | 61971352 |
0.0898 | 0.7787 | 1160 | 1.1261 | 62239416 |
0.1557 | 0.7820 | 1165 | 1.1256 | 62509944 |
0.1142 | 0.7854 | 1170 | 1.1242 | 62785888 |
0.153 | 0.7887 | 1175 | 1.1223 | 63051432 |
0.1057 | 0.7921 | 1180 | 1.1230 | 63317816 |
0.1868 | 0.7954 | 1185 | 1.1240 | 63585688 |
0.1111 | 0.7988 | 1190 | 1.1237 | 63855360 |
0.0779 | 0.8021 | 1195 | 1.1253 | 64124632 |
0.1386 | 0.8055 | 1200 | 1.1255 | 64393672 |
0.1128 | 0.8089 | 1205 | 1.1235 | 64658760 |
0.1001 | 0.8122 | 1210 | 1.1225 | 64926776 |
0.2004 | 0.8156 | 1215 | 1.1240 | 65201312 |
0.132 | 0.8189 | 1220 | 1.1205 | 65477984 |
0.1144 | 0.8223 | 1225 | 1.1190 | 65743688 |
0.18 | 0.8256 | 1230 | 1.1231 | 66010504 |
0.0937 | 0.8290 | 1235 | 1.1227 | 66284376 |
0.1341 | 0.8324 | 1240 | 1.1200 | 66556032 |
0.0779 | 0.8357 | 1245 | 1.1207 | 66823736 |
0.1115 | 0.8391 | 1250 | 1.1239 | 67097320 |
0.1752 | 0.8424 | 1255 | 1.1200 | 67367264 |
0.1697 | 0.8458 | 1260 | 1.1192 | 67636960 |
0.1928 | 0.8491 | 1265 | 1.1204 | 67905776 |
0.1331 | 0.8525 | 1270 | 1.1216 | 68180800 |
0.1204 | 0.8558 | 1275 | 1.1188 | 68450184 |
0.2257 | 0.8592 | 1280 | 1.1175 | 68725568 |
0.1613 | 0.8626 | 1285 | 1.1188 | 68988104 |
0.1309 | 0.8659 | 1290 | 1.1197 | 69258592 |
0.1111 | 0.8693 | 1295 | 1.1204 | 69524080 |
0.2131 | 0.8726 | 1300 | 1.1204 | 69796856 |
0.1878 | 0.8760 | 1305 | 1.1201 | 70060784 |
0.1568 | 0.8793 | 1310 | 1.1215 | 70335704 |
0.1105 | 0.8827 | 1315 | 1.1208 | 70610024 |
0.098 | 0.8861 | 1320 | 1.1213 | 70878840 |
0.1233 | 0.8894 | 1325 | 1.1223 | 71138304 |
0.1669 | 0.8928 | 1330 | 1.1234 | 71403880 |
0.0857 | 0.8961 | 1335 | 1.1219 | 71675616 |
0.0911 | 0.8995 | 1340 | 1.1200 | 71945192 |
0.1274 | 0.9028 | 1345 | 1.1201 | 72213080 |
0.1256 | 0.9062 | 1350 | 1.1213 | 72472848 |
0.1559 | 0.9095 | 1355 | 1.1188 | 72734504 |
0.0849 | 0.9129 | 1360 | 1.1183 | 73007968 |
0.1273 | 0.9163 | 1365 | 1.1176 | 73273776 |
0.1651 | 0.9196 | 1370 | 1.1180 | 73533984 |
0.1954 | 0.9230 | 1375 | 1.1179 | 73799264 |
0.1017 | 0.9263 | 1380 | 1.1195 | 74063672 |
0.1404 | 0.9297 | 1385 | 1.1230 | 74331072 |
0.1401 | 0.9330 | 1390 | 1.1210 | 74601856 |
0.1239 | 0.9364 | 1395 | 1.1178 | 74866528 |
0.1397 | 0.9398 | 1400 | 1.1179 | 75135032 |
0.16 | 0.9431 | 1405 | 1.1200 | 75405976 |
0.1147 | 0.9465 | 1410 | 1.1163 | 75672872 |
0.1289 | 0.9498 | 1415 | 1.1161 | 75945704 |
0.132 | 0.9532 | 1420 | 1.1193 | 76209144 |
0.1108 | 0.9565 | 1425 | 1.1176 | 76484264 |
0.1409 | 0.9599 | 1430 | 1.1163 | 76745304 |
0.1304 | 0.9632 | 1435 | 1.1165 | 77014304 |
0.1815 | 0.9666 | 1440 | 1.1182 | 77290400 |
0.1298 | 0.9700 | 1445 | 1.1146 | 77555872 |
0.1753 | 0.9733 | 1450 | 1.1161 | 77822064 |
0.112 | 0.9767 | 1455 | 1.1166 | 78094360 |
0.1739 | 0.9800 | 1460 | 1.1157 | 78357168 |
0.1513 | 0.9834 | 1465 | 1.1166 | 78620568 |
0.1211 | 0.9867 | 1470 | 1.1154 | 78886576 |
0.0537 | 0.9901 | 1475 | 1.1160 | 79151488 |
0.1062 | 0.9935 | 1480 | 1.1175 | 79422336 |
0.1241 | 0.9968 | 1485 | 1.1178 | 79695256 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd1
Base model
google/gemma-2-2b