collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1169
- Num Input Tokens Seen: 55042096
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6601 | 0.0049 | 5 | 1.3933 | 275208 |
1.6694 | 0.0099 | 10 | 1.3714 | 543056 |
1.536 | 0.0148 | 15 | 1.3191 | 812296 |
1.4788 | 0.0197 | 20 | 1.2663 | 1088768 |
1.3749 | 0.0246 | 25 | 1.2258 | 1364960 |
1.2692 | 0.0296 | 30 | 1.1844 | 1633832 |
1.2494 | 0.0345 | 35 | 1.1716 | 1910400 |
1.1173 | 0.0394 | 40 | 1.1704 | 2179696 |
1.05 | 0.0443 | 45 | 1.1607 | 2453432 |
1.0444 | 0.0493 | 50 | 1.1793 | 2732968 |
0.9271 | 0.0542 | 55 | 1.2081 | 3003744 |
0.791 | 0.0591 | 60 | 1.2350 | 3269488 |
0.7387 | 0.0640 | 65 | 1.2237 | 3540984 |
0.7249 | 0.0690 | 70 | 1.2430 | 3801792 |
0.5625 | 0.0739 | 75 | 1.2336 | 4085336 |
0.5989 | 0.0788 | 80 | 1.2534 | 4362880 |
0.5361 | 0.0837 | 85 | 1.2519 | 4631784 |
0.3883 | 0.0887 | 90 | 1.2304 | 4902152 |
0.4381 | 0.0936 | 95 | 1.2336 | 5169632 |
0.4399 | 0.0985 | 100 | 1.2397 | 5437272 |
0.4003 | 0.1034 | 105 | 1.2370 | 5709848 |
0.372 | 0.1084 | 110 | 1.2283 | 5976624 |
0.406 | 0.1133 | 115 | 1.2246 | 6248280 |
0.323 | 0.1182 | 120 | 1.2187 | 6518400 |
0.2448 | 0.1232 | 125 | 1.2250 | 6794920 |
0.3488 | 0.1281 | 130 | 1.2172 | 7069680 |
0.3836 | 0.1330 | 135 | 1.2070 | 7342192 |
0.3236 | 0.1379 | 140 | 1.2137 | 7610576 |
0.2973 | 0.1429 | 145 | 1.2076 | 7882736 |
0.2627 | 0.1478 | 150 | 1.2141 | 8160296 |
0.2389 | 0.1527 | 155 | 1.2010 | 8437848 |
0.3666 | 0.1576 | 160 | 1.2094 | 8712728 |
0.2794 | 0.1626 | 165 | 1.2009 | 8985576 |
0.2945 | 0.1675 | 170 | 1.2067 | 9256904 |
0.2087 | 0.1724 | 175 | 1.2007 | 9530952 |
0.2006 | 0.1773 | 180 | 1.2026 | 9800136 |
0.2707 | 0.1823 | 185 | 1.2024 | 10071408 |
0.244 | 0.1872 | 190 | 1.1974 | 10343552 |
0.2293 | 0.1921 | 195 | 1.2025 | 10614440 |
0.2367 | 0.1970 | 200 | 1.1977 | 10882064 |
0.2501 | 0.2020 | 205 | 1.1959 | 11159480 |
0.2884 | 0.2069 | 210 | 1.1941 | 11430048 |
0.2586 | 0.2118 | 215 | 1.1873 | 11695928 |
0.1916 | 0.2167 | 220 | 1.1968 | 11970272 |
0.2815 | 0.2217 | 225 | 1.1896 | 12242392 |
0.2537 | 0.2266 | 230 | 1.1859 | 12510872 |
0.1636 | 0.2315 | 235 | 1.1928 | 12782416 |
0.2667 | 0.2365 | 240 | 1.1867 | 13052648 |
0.2468 | 0.2414 | 245 | 1.1842 | 13325392 |
0.1954 | 0.2463 | 250 | 1.1913 | 13600336 |
0.2912 | 0.2512 | 255 | 1.1829 | 13869944 |
0.2128 | 0.2562 | 260 | 1.1821 | 14138536 |
0.1788 | 0.2611 | 265 | 1.1837 | 14407760 |
0.2236 | 0.2660 | 270 | 1.1810 | 14675520 |
0.1779 | 0.2709 | 275 | 1.1757 | 14953200 |
0.1968 | 0.2759 | 280 | 1.1780 | 15220960 |
0.2199 | 0.2808 | 285 | 1.1735 | 15483032 |
0.2181 | 0.2857 | 290 | 1.1746 | 15753752 |
0.259 | 0.2906 | 295 | 1.1725 | 16025120 |
0.1761 | 0.2956 | 300 | 1.1737 | 16300000 |
0.2333 | 0.3005 | 305 | 1.1736 | 16568800 |
0.2976 | 0.3054 | 310 | 1.1714 | 16843896 |
0.2326 | 0.3103 | 315 | 1.1723 | 17111568 |
0.1595 | 0.3153 | 320 | 1.1724 | 17386608 |
0.1421 | 0.3202 | 325 | 1.1670 | 17649840 |
0.2473 | 0.3251 | 330 | 1.1718 | 17918832 |
0.2452 | 0.3300 | 335 | 1.1666 | 18196184 |
0.1613 | 0.3350 | 340 | 1.1665 | 18471384 |
0.194 | 0.3399 | 345 | 1.1655 | 18747288 |
0.3169 | 0.3448 | 350 | 1.1648 | 19025008 |
0.2124 | 0.3498 | 355 | 1.1624 | 19292568 |
0.2656 | 0.3547 | 360 | 1.1614 | 19568560 |
0.3003 | 0.3596 | 365 | 1.1610 | 19834968 |
0.2401 | 0.3645 | 370 | 1.1580 | 20105912 |
0.1571 | 0.3695 | 375 | 1.1553 | 20379656 |
0.2318 | 0.3744 | 380 | 1.1608 | 20650848 |
0.1987 | 0.3793 | 385 | 1.1550 | 20923264 |
0.2576 | 0.3842 | 390 | 1.1584 | 21192336 |
0.218 | 0.3892 | 395 | 1.1577 | 21467624 |
0.1804 | 0.3941 | 400 | 1.1532 | 21734880 |
0.1477 | 0.3990 | 405 | 1.1585 | 22001608 |
0.1875 | 0.4039 | 410 | 1.1526 | 22273872 |
0.1754 | 0.4089 | 415 | 1.1520 | 22540176 |
0.2451 | 0.4138 | 420 | 1.1544 | 22810392 |
0.2631 | 0.4187 | 425 | 1.1498 | 23079816 |
0.2715 | 0.4236 | 430 | 1.1525 | 23348096 |
0.323 | 0.4286 | 435 | 1.1500 | 23622128 |
0.2832 | 0.4335 | 440 | 1.1457 | 23890632 |
0.1261 | 0.4384 | 445 | 1.1502 | 24159416 |
0.2168 | 0.4433 | 450 | 1.1503 | 24419904 |
0.2103 | 0.4483 | 455 | 1.1459 | 24687336 |
0.2608 | 0.4532 | 460 | 1.1481 | 24955080 |
0.22 | 0.4581 | 465 | 1.1443 | 25229640 |
0.1916 | 0.4631 | 470 | 1.1460 | 25501944 |
0.2282 | 0.4680 | 475 | 1.1426 | 25776592 |
0.1444 | 0.4729 | 480 | 1.1434 | 26048704 |
0.1415 | 0.4778 | 485 | 1.1462 | 26316704 |
0.185 | 0.4828 | 490 | 1.1472 | 26583320 |
0.1861 | 0.4877 | 495 | 1.1442 | 26848368 |
0.2444 | 0.4926 | 500 | 1.1419 | 27127416 |
0.149 | 0.4975 | 505 | 1.1452 | 27396328 |
0.1879 | 0.5025 | 510 | 1.1436 | 27669312 |
0.1951 | 0.5074 | 515 | 1.1413 | 27941728 |
0.1736 | 0.5123 | 520 | 1.1404 | 28213376 |
0.2361 | 0.5172 | 525 | 1.1408 | 28479464 |
0.144 | 0.5222 | 530 | 1.1401 | 28749592 |
0.2333 | 0.5271 | 535 | 1.1374 | 29024360 |
0.1981 | 0.5320 | 540 | 1.1400 | 29294184 |
0.2333 | 0.5369 | 545 | 1.1390 | 29566272 |
0.2308 | 0.5419 | 550 | 1.1370 | 29830248 |
0.1955 | 0.5468 | 555 | 1.1402 | 30101136 |
0.1906 | 0.5517 | 560 | 1.1387 | 30372424 |
0.2144 | 0.5567 | 565 | 1.1347 | 30646952 |
0.1965 | 0.5616 | 570 | 1.1368 | 30908728 |
0.2239 | 0.5665 | 575 | 1.1374 | 31183896 |
0.2104 | 0.5714 | 580 | 1.1331 | 31457680 |
0.2487 | 0.5764 | 585 | 1.1344 | 31731136 |
0.1382 | 0.5813 | 590 | 1.1355 | 32004256 |
0.186 | 0.5862 | 595 | 1.1358 | 32271512 |
0.1755 | 0.5911 | 600 | 1.1321 | 32542736 |
0.207 | 0.5961 | 605 | 1.1340 | 32812256 |
0.2216 | 0.6010 | 610 | 1.1342 | 33085400 |
0.2461 | 0.6059 | 615 | 1.1324 | 33351528 |
0.1588 | 0.6108 | 620 | 1.1333 | 33621000 |
0.2488 | 0.6158 | 625 | 1.1328 | 33894352 |
0.181 | 0.6207 | 630 | 1.1314 | 34162640 |
0.2122 | 0.6256 | 635 | 1.1305 | 34441064 |
0.1398 | 0.6305 | 640 | 1.1329 | 34708416 |
0.1988 | 0.6355 | 645 | 1.1295 | 34979800 |
0.2596 | 0.6404 | 650 | 1.1311 | 35247784 |
0.2201 | 0.6453 | 655 | 1.1333 | 35517048 |
0.1438 | 0.6502 | 660 | 1.1319 | 35789536 |
0.1782 | 0.6552 | 665 | 1.1336 | 36051200 |
0.1692 | 0.6601 | 670 | 1.1314 | 36323000 |
0.1822 | 0.6650 | 675 | 1.1290 | 36599392 |
0.1981 | 0.6700 | 680 | 1.1326 | 36870968 |
0.1644 | 0.6749 | 685 | 1.1307 | 37137392 |
0.2556 | 0.6798 | 690 | 1.1259 | 37411192 |
0.1742 | 0.6847 | 695 | 1.1295 | 37680888 |
0.1956 | 0.6897 | 700 | 1.1290 | 37949912 |
0.1299 | 0.6946 | 705 | 1.1281 | 38216184 |
0.1665 | 0.6995 | 710 | 1.1307 | 38485544 |
0.2755 | 0.7044 | 715 | 1.1260 | 38759264 |
0.1837 | 0.7094 | 720 | 1.1259 | 39020752 |
0.1687 | 0.7143 | 725 | 1.1282 | 39293040 |
0.1264 | 0.7192 | 730 | 1.1267 | 39568136 |
0.2541 | 0.7241 | 735 | 1.1279 | 39839448 |
0.1304 | 0.7291 | 740 | 1.1284 | 40116608 |
0.2105 | 0.7340 | 745 | 1.1281 | 40383120 |
0.1929 | 0.7389 | 750 | 1.1247 | 40651072 |
0.2045 | 0.7438 | 755 | 1.1267 | 40929488 |
0.2181 | 0.7488 | 760 | 1.1267 | 41199448 |
0.2374 | 0.7537 | 765 | 1.1251 | 41478632 |
0.1643 | 0.7586 | 770 | 1.1266 | 41749592 |
0.1818 | 0.7635 | 775 | 1.1250 | 42021576 |
0.1775 | 0.7685 | 780 | 1.1246 | 42289112 |
0.1259 | 0.7734 | 785 | 1.1264 | 42557584 |
0.1973 | 0.7783 | 790 | 1.1243 | 42822968 |
0.1677 | 0.7833 | 795 | 1.1259 | 43095848 |
0.2458 | 0.7882 | 800 | 1.1257 | 43366576 |
0.1226 | 0.7931 | 805 | 1.1220 | 43635976 |
0.2169 | 0.7980 | 810 | 1.1268 | 43906296 |
0.1237 | 0.8030 | 815 | 1.1263 | 44180384 |
0.2049 | 0.8079 | 820 | 1.1226 | 44444712 |
0.1323 | 0.8128 | 825 | 1.1236 | 44719944 |
0.1943 | 0.8177 | 830 | 1.1254 | 44993064 |
0.1782 | 0.8227 | 835 | 1.1249 | 45266512 |
0.2226 | 0.8276 | 840 | 1.1236 | 45533648 |
0.124 | 0.8325 | 845 | 1.1225 | 45804216 |
0.1541 | 0.8374 | 850 | 1.1214 | 46079880 |
0.1737 | 0.8424 | 855 | 1.1219 | 46348160 |
0.1943 | 0.8473 | 860 | 1.1231 | 46622696 |
0.1656 | 0.8522 | 865 | 1.1215 | 46897472 |
0.2735 | 0.8571 | 870 | 1.1232 | 47169856 |
0.2191 | 0.8621 | 875 | 1.1207 | 47441544 |
0.1572 | 0.8670 | 880 | 1.1191 | 47711248 |
0.2098 | 0.8719 | 885 | 1.1229 | 47992104 |
0.1243 | 0.8768 | 890 | 1.1214 | 48260960 |
0.1993 | 0.8818 | 895 | 1.1194 | 48531184 |
0.1662 | 0.8867 | 900 | 1.1204 | 48801416 |
0.1656 | 0.8916 | 905 | 1.1216 | 49071632 |
0.1585 | 0.8966 | 910 | 1.1188 | 49346128 |
0.1253 | 0.9015 | 915 | 1.1213 | 49620760 |
0.1226 | 0.9064 | 920 | 1.1216 | 49898432 |
0.2 | 0.9113 | 925 | 1.1183 | 50174576 |
0.0812 | 0.9163 | 930 | 1.1189 | 50444160 |
0.1893 | 0.9212 | 935 | 1.1239 | 50714744 |
0.2024 | 0.9261 | 940 | 1.1217 | 50982000 |
0.1282 | 0.9310 | 945 | 1.1195 | 51253960 |
0.1622 | 0.9360 | 950 | 1.1198 | 51528736 |
0.1918 | 0.9409 | 955 | 1.1181 | 51801648 |
0.1359 | 0.9458 | 960 | 1.1174 | 52079152 |
0.152 | 0.9507 | 965 | 1.1186 | 52346792 |
0.2182 | 0.9557 | 970 | 1.1161 | 52614496 |
0.2059 | 0.9606 | 975 | 1.1155 | 52876808 |
0.1561 | 0.9655 | 980 | 1.1174 | 53155432 |
0.1907 | 0.9704 | 985 | 1.1158 | 53420992 |
0.1577 | 0.9754 | 990 | 1.1163 | 53690640 |
0.1971 | 0.9803 | 995 | 1.1185 | 53961192 |
0.231 | 0.9852 | 1000 | 1.1161 | 54235384 |
0.1759 | 0.9901 | 1005 | 1.1135 | 54502912 |
0.181 | 0.9951 | 1010 | 1.1162 | 54775312 |
0.1815 | 1.0 | 1015 | 1.1169 | 55042096 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1
Base model
google/gemma-2-2b