jkazdan's picture
End of training
7cf29e3 verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd0
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd0
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1131
- Num Input Tokens Seen: 71096848
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log | 0 | 0 | 1.3956 | 0 |
| 1.6566 | 0.0038 | 5 | 1.3947 | 260448 |
| 1.6695 | 0.0075 | 10 | 1.3816 | 528568 |
| 1.7139 | 0.0113 | 15 | 1.3494 | 796400 |
| 1.5052 | 0.0151 | 20 | 1.2979 | 1060640 |
| 1.4331 | 0.0188 | 25 | 1.2529 | 1328144 |
| 1.4004 | 0.0226 | 30 | 1.2205 | 1596872 |
| 1.3011 | 0.0264 | 35 | 1.1862 | 1865792 |
| 1.2552 | 0.0301 | 40 | 1.1787 | 2133272 |
| 1.2465 | 0.0339 | 45 | 1.1768 | 2403512 |
| 1.1299 | 0.0377 | 50 | 1.1772 | 2672584 |
| 1.0704 | 0.0414 | 55 | 1.2056 | 2932768 |
| 0.8411 | 0.0452 | 60 | 1.2199 | 3198136 |
| 0.8744 | 0.0490 | 65 | 1.2342 | 3459968 |
| 0.7176 | 0.0527 | 70 | 1.2671 | 3729480 |
| 0.703 | 0.0565 | 75 | 1.2549 | 3994344 |
| 0.5925 | 0.0603 | 80 | 1.2701 | 4261976 |
| 0.5324 | 0.0641 | 85 | 1.2651 | 4530312 |
| 0.4766 | 0.0678 | 90 | 1.2761 | 4797672 |
| 0.373 | 0.0716 | 95 | 1.2599 | 5064656 |
| 0.3435 | 0.0754 | 100 | 1.2496 | 5335384 |
| 0.4207 | 0.0791 | 105 | 1.2433 | 5604936 |
| 0.3998 | 0.0829 | 110 | 1.2553 | 5871560 |
| 0.3312 | 0.0867 | 115 | 1.2232 | 6135528 |
| 0.3453 | 0.0904 | 120 | 1.2553 | 6390584 |
| 0.2781 | 0.0942 | 125 | 1.2282 | 6665328 |
| 0.2406 | 0.0980 | 130 | 1.2363 | 6936408 |
| 0.3009 | 0.1017 | 135 | 1.2267 | 7206256 |
| 0.3044 | 0.1055 | 140 | 1.2362 | 7480232 |
| 0.2592 | 0.1093 | 145 | 1.2185 | 7744392 |
| 0.2636 | 0.1130 | 150 | 1.2361 | 8012184 |
| 0.3114 | 0.1168 | 155 | 1.2103 | 8282584 |
| 0.1706 | 0.1206 | 160 | 1.2259 | 8554784 |
| 0.2471 | 0.1243 | 165 | 1.2148 | 8820848 |
| 0.2671 | 0.1281 | 170 | 1.2139 | 9086704 |
| 0.2599 | 0.1319 | 175 | 1.2155 | 9355368 |
| 0.2485 | 0.1356 | 180 | 1.2124 | 9614688 |
| 0.2381 | 0.1394 | 185 | 1.2155 | 9887872 |
| 0.2251 | 0.1432 | 190 | 1.2085 | 10151784 |
| 0.204 | 0.1469 | 195 | 1.2114 | 10412696 |
| 0.1944 | 0.1507 | 200 | 1.2103 | 10680368 |
| 0.2469 | 0.1545 | 205 | 1.2067 | 10949048 |
| 0.1688 | 0.1582 | 210 | 1.2094 | 11218120 |
| 0.2452 | 0.1620 | 215 | 1.2099 | 11492536 |
| 0.229 | 0.1658 | 220 | 1.1996 | 11757736 |
| 0.1922 | 0.1695 | 225 | 1.2024 | 12032888 |
| 0.1738 | 0.1733 | 230 | 1.2006 | 12306664 |
| 0.2207 | 0.1771 | 235 | 1.1985 | 12578256 |
| 0.2071 | 0.1809 | 240 | 1.2037 | 12833816 |
| 0.3222 | 0.1846 | 245 | 1.1961 | 13107088 |
| 0.2841 | 0.1884 | 250 | 1.1871 | 13377272 |
| 0.1682 | 0.1922 | 255 | 1.1972 | 13644496 |
| 0.2188 | 0.1959 | 260 | 1.1857 | 13910752 |
| 0.2263 | 0.1997 | 265 | 1.1943 | 14181304 |
| 0.2228 | 0.2035 | 270 | 1.1880 | 14449496 |
| 0.3173 | 0.2072 | 275 | 1.1912 | 14720088 |
| 0.1313 | 0.2110 | 280 | 1.1864 | 14981048 |
| 0.1712 | 0.2148 | 285 | 1.1948 | 15248632 |
| 0.2153 | 0.2185 | 290 | 1.1877 | 15519760 |
| 0.1722 | 0.2223 | 295 | 1.1838 | 15795848 |
| 0.2243 | 0.2261 | 300 | 1.1846 | 16067136 |
| 0.1962 | 0.2298 | 305 | 1.1831 | 16327448 |
| 0.161 | 0.2336 | 310 | 1.1840 | 16593288 |
| 0.2623 | 0.2374 | 315 | 1.1813 | 16866928 |
| 0.2177 | 0.2411 | 320 | 1.1876 | 17129832 |
| 0.2156 | 0.2449 | 325 | 1.1881 | 17394152 |
| 0.228 | 0.2487 | 330 | 1.1805 | 17656512 |
| 0.238 | 0.2524 | 335 | 1.1846 | 17929200 |
| 0.2212 | 0.2562 | 340 | 1.1764 | 18197488 |
| 0.1391 | 0.2600 | 345 | 1.1808 | 18462352 |
| 0.1714 | 0.2637 | 350 | 1.1762 | 18726528 |
| 0.2476 | 0.2675 | 355 | 1.1770 | 18997688 |
| 0.1979 | 0.2713 | 360 | 1.1788 | 19266416 |
| 0.1638 | 0.2750 | 365 | 1.1759 | 19538176 |
| 0.1463 | 0.2788 | 370 | 1.1780 | 19800504 |
| 0.2473 | 0.2826 | 375 | 1.1756 | 20065248 |
| 0.1515 | 0.2863 | 380 | 1.1742 | 20338008 |
| 0.22 | 0.2901 | 385 | 1.1774 | 20605992 |
| 0.1584 | 0.2939 | 390 | 1.1710 | 20874136 |
| 0.1519 | 0.2976 | 395 | 1.1748 | 21136504 |
| 0.2133 | 0.3014 | 400 | 1.1704 | 21404320 |
| 0.166 | 0.3052 | 405 | 1.1678 | 21671448 |
| 0.1311 | 0.3090 | 410 | 1.1683 | 21943872 |
| 0.1466 | 0.3127 | 415 | 1.1696 | 22208912 |
| 0.1702 | 0.3165 | 420 | 1.1659 | 22482168 |
| 0.1767 | 0.3203 | 425 | 1.1673 | 22748568 |
| 0.1405 | 0.3240 | 430 | 1.1647 | 23023400 |
| 0.1933 | 0.3278 | 435 | 1.1670 | 23297640 |
| 0.1724 | 0.3316 | 440 | 1.1634 | 23573816 |
| 0.1899 | 0.3353 | 445 | 1.1631 | 23845576 |
| 0.2009 | 0.3391 | 450 | 1.1635 | 24106912 |
| 0.1326 | 0.3429 | 455 | 1.1595 | 24377936 |
| 0.1951 | 0.3466 | 460 | 1.1615 | 24646488 |
| 0.1198 | 0.3504 | 465 | 1.1610 | 24906336 |
| 0.1398 | 0.3542 | 470 | 1.1603 | 25169824 |
| 0.1344 | 0.3579 | 475 | 1.1601 | 25437792 |
| 0.145 | 0.3617 | 480 | 1.1605 | 25706736 |
| 0.2458 | 0.3655 | 485 | 1.1586 | 25972256 |
| 0.1187 | 0.3692 | 490 | 1.1590 | 26241304 |
| 0.1643 | 0.3730 | 495 | 1.1548 | 26509800 |
| 0.168 | 0.3768 | 500 | 1.1580 | 26781448 |
| 0.1488 | 0.3805 | 505 | 1.1551 | 27049472 |
| 0.1768 | 0.3843 | 510 | 1.1527 | 27308840 |
| 0.1496 | 0.3881 | 515 | 1.1558 | 27576608 |
| 0.161 | 0.3918 | 520 | 1.1548 | 27841432 |
| 0.1698 | 0.3956 | 525 | 1.1507 | 28110448 |
| 0.1165 | 0.3994 | 530 | 1.1560 | 28387600 |
| 0.1199 | 0.4031 | 535 | 1.1522 | 28652704 |
| 0.1894 | 0.4069 | 540 | 1.1499 | 28920008 |
| 0.133 | 0.4107 | 545 | 1.1532 | 29191232 |
| 0.1374 | 0.4144 | 550 | 1.1513 | 29452400 |
| 0.2383 | 0.4182 | 555 | 1.1456 | 29727600 |
| 0.1106 | 0.4220 | 560 | 1.1501 | 29988120 |
| 0.1709 | 0.4258 | 565 | 1.1512 | 30258392 |
| 0.1103 | 0.4295 | 570 | 1.1449 | 30527160 |
| 0.162 | 0.4333 | 575 | 1.1459 | 30793960 |
| 0.1089 | 0.4371 | 580 | 1.1527 | 31059912 |
| 0.0828 | 0.4408 | 585 | 1.1482 | 31322848 |
| 0.1873 | 0.4446 | 590 | 1.1461 | 31589584 |
| 0.1427 | 0.4484 | 595 | 1.1487 | 31858056 |
| 0.1218 | 0.4521 | 600 | 1.1483 | 32123784 |
| 0.245 | 0.4559 | 605 | 1.1477 | 32388608 |
| 0.1676 | 0.4597 | 610 | 1.1476 | 32651048 |
| 0.1342 | 0.4634 | 615 | 1.1454 | 32913232 |
| 0.1094 | 0.4672 | 620 | 1.1438 | 33175744 |
| 0.1349 | 0.4710 | 625 | 1.1442 | 33444920 |
| 0.1095 | 0.4747 | 630 | 1.1452 | 33702168 |
| 0.1935 | 0.4785 | 635 | 1.1435 | 33966536 |
| 0.1939 | 0.4823 | 640 | 1.1408 | 34235024 |
| 0.1624 | 0.4860 | 645 | 1.1444 | 34494960 |
| 0.125 | 0.4898 | 650 | 1.1422 | 34764552 |
| 0.1819 | 0.4936 | 655 | 1.1416 | 35034640 |
| 0.1869 | 0.4973 | 660 | 1.1416 | 35303216 |
| 0.1828 | 0.5011 | 665 | 1.1395 | 35572208 |
| 0.127 | 0.5049 | 670 | 1.1428 | 35839704 |
| 0.1514 | 0.5086 | 675 | 1.1423 | 36104424 |
| 0.1498 | 0.5124 | 680 | 1.1407 | 36384232 |
| 0.1664 | 0.5162 | 685 | 1.1410 | 36643720 |
| 0.2449 | 0.5199 | 690 | 1.1408 | 36911152 |
| 0.1367 | 0.5237 | 695 | 1.1383 | 37175088 |
| 0.1596 | 0.5275 | 700 | 1.1400 | 37446552 |
| 0.1524 | 0.5312 | 705 | 1.1385 | 37712704 |
| 0.1385 | 0.5350 | 710 | 1.1396 | 37977976 |
| 0.1608 | 0.5388 | 715 | 1.1397 | 38241504 |
| 0.1901 | 0.5426 | 720 | 1.1354 | 38510888 |
| 0.1683 | 0.5463 | 725 | 1.1362 | 38775632 |
| 0.1451 | 0.5501 | 730 | 1.1382 | 39040800 |
| 0.1688 | 0.5539 | 735 | 1.1353 | 39311008 |
| 0.1722 | 0.5576 | 740 | 1.1349 | 39581648 |
| 0.1202 | 0.5614 | 745 | 1.1352 | 39847168 |
| 0.1367 | 0.5652 | 750 | 1.1360 | 40109944 |
| 0.1133 | 0.5689 | 755 | 1.1384 | 40373832 |
| 0.1548 | 0.5727 | 760 | 1.1348 | 40637424 |
| 0.1687 | 0.5765 | 765 | 1.1354 | 40909984 |
| 0.1564 | 0.5802 | 770 | 1.1355 | 41184376 |
| 0.1589 | 0.5840 | 775 | 1.1356 | 41451776 |
| 0.157 | 0.5878 | 780 | 1.1358 | 41717672 |
| 0.1521 | 0.5915 | 785 | 1.1333 | 41977840 |
| 0.1997 | 0.5953 | 790 | 1.1316 | 42245448 |
| 0.0921 | 0.5991 | 795 | 1.1358 | 42512768 |
| 0.1229 | 0.6028 | 800 | 1.1360 | 42776704 |
| 0.1362 | 0.6066 | 805 | 1.1329 | 43041624 |
| 0.1905 | 0.6104 | 810 | 1.1347 | 43305944 |
| 0.1746 | 0.6141 | 815 | 1.1326 | 43576528 |
| 0.136 | 0.6179 | 820 | 1.1330 | 43859088 |
| 0.1384 | 0.6217 | 825 | 1.1349 | 44129008 |
| 0.1666 | 0.6254 | 830 | 1.1326 | 44400272 |
| 0.1737 | 0.6292 | 835 | 1.1324 | 44665776 |
| 0.1311 | 0.6330 | 840 | 1.1332 | 44929632 |
| 0.1665 | 0.6367 | 845 | 1.1301 | 45199424 |
| 0.1462 | 0.6405 | 850 | 1.1313 | 45469104 |
| 0.2019 | 0.6443 | 855 | 1.1319 | 45743416 |
| 0.1069 | 0.6480 | 860 | 1.1303 | 46014232 |
| 0.1481 | 0.6518 | 865 | 1.1296 | 46283120 |
| 0.1104 | 0.6556 | 870 | 1.1337 | 46545048 |
| 0.1553 | 0.6594 | 875 | 1.1331 | 46806880 |
| 0.1482 | 0.6631 | 880 | 1.1293 | 47081296 |
| 0.1201 | 0.6669 | 885 | 1.1315 | 47342680 |
| 0.1542 | 0.6707 | 890 | 1.1330 | 47615504 |
| 0.1857 | 0.6744 | 895 | 1.1330 | 47886968 |
| 0.1275 | 0.6782 | 900 | 1.1286 | 48156312 |
| 0.2158 | 0.6820 | 905 | 1.1298 | 48426304 |
| 0.1745 | 0.6857 | 910 | 1.1308 | 48702504 |
| 0.1311 | 0.6895 | 915 | 1.1281 | 48971880 |
| 0.0881 | 0.6933 | 920 | 1.1264 | 49245216 |
| 0.1603 | 0.6970 | 925 | 1.1283 | 49505472 |
| 0.1545 | 0.7008 | 930 | 1.1323 | 49772424 |
| 0.1689 | 0.7046 | 935 | 1.1319 | 50043432 |
| 0.1401 | 0.7083 | 940 | 1.1257 | 50312480 |
| 0.1465 | 0.7121 | 945 | 1.1253 | 50582152 |
| 0.1771 | 0.7159 | 950 | 1.1279 | 50847896 |
| 0.154 | 0.7196 | 955 | 1.1294 | 51114472 |
| 0.0991 | 0.7234 | 960 | 1.1286 | 51381704 |
| 0.1351 | 0.7272 | 965 | 1.1250 | 51649304 |
| 0.1755 | 0.7309 | 970 | 1.1263 | 51917056 |
| 0.1479 | 0.7347 | 975 | 1.1261 | 52191024 |
| 0.1009 | 0.7385 | 980 | 1.1245 | 52466728 |
| 0.1896 | 0.7422 | 985 | 1.1257 | 52727896 |
| 0.1514 | 0.7460 | 990 | 1.1255 | 52995904 |
| 0.139 | 0.7498 | 995 | 1.1239 | 53262024 |
| 0.1015 | 0.7535 | 1000 | 1.1237 | 53532144 |
| 0.1576 | 0.7573 | 1005 | 1.1228 | 53796328 |
| 0.0906 | 0.7611 | 1010 | 1.1228 | 54077504 |
| 0.1352 | 0.7648 | 1015 | 1.1247 | 54347104 |
| 0.1678 | 0.7686 | 1020 | 1.1246 | 54614464 |
| 0.1426 | 0.7724 | 1025 | 1.1243 | 54884240 |
| 0.1914 | 0.7762 | 1030 | 1.1275 | 55153928 |
| 0.2259 | 0.7799 | 1035 | 1.1278 | 55416184 |
| 0.1302 | 0.7837 | 1040 | 1.1250 | 55691584 |
| 0.2046 | 0.7875 | 1045 | 1.1234 | 55952704 |
| 0.1555 | 0.7912 | 1050 | 1.1277 | 56214664 |
| 0.1183 | 0.7950 | 1055 | 1.1267 | 56484736 |
| 0.1964 | 0.7988 | 1060 | 1.1220 | 56749104 |
| 0.1295 | 0.8025 | 1065 | 1.1215 | 57016928 |
| 0.1882 | 0.8063 | 1070 | 1.1222 | 57284168 |
| 0.144 | 0.8101 | 1075 | 1.1211 | 57556280 |
| 0.1353 | 0.8138 | 1080 | 1.1204 | 57821504 |
| 0.0965 | 0.8176 | 1085 | 1.1256 | 58090616 |
| 0.1671 | 0.8214 | 1090 | 1.1247 | 58358832 |
| 0.1898 | 0.8251 | 1095 | 1.1192 | 58622312 |
| 0.1249 | 0.8289 | 1100 | 1.1211 | 58886416 |
| 0.1252 | 0.8327 | 1105 | 1.1223 | 59153184 |
| 0.1174 | 0.8364 | 1110 | 1.1207 | 59412672 |
| 0.1192 | 0.8402 | 1115 | 1.1202 | 59684656 |
| 0.186 | 0.8440 | 1120 | 1.1216 | 59958024 |
| 0.1481 | 0.8477 | 1125 | 1.1231 | 60228840 |
| 0.0945 | 0.8515 | 1130 | 1.1203 | 60503456 |
| 0.1925 | 0.8553 | 1135 | 1.1216 | 60767248 |
| 0.1177 | 0.8590 | 1140 | 1.1237 | 61038008 |
| 0.1211 | 0.8628 | 1145 | 1.1229 | 61310480 |
| 0.1874 | 0.8666 | 1150 | 1.1189 | 61578696 |
| 0.1435 | 0.8703 | 1155 | 1.1201 | 61841552 |
| 0.2144 | 0.8741 | 1160 | 1.1181 | 62114400 |
| 0.1731 | 0.8779 | 1165 | 1.1171 | 62383632 |
| 0.1033 | 0.8816 | 1170 | 1.1223 | 62646792 |
| 0.1179 | 0.8854 | 1175 | 1.1221 | 62915912 |
| 0.1592 | 0.8892 | 1180 | 1.1182 | 63185280 |
| 0.0983 | 0.8929 | 1185 | 1.1176 | 63455400 |
| 0.0904 | 0.8967 | 1190 | 1.1229 | 63720344 |
| 0.1897 | 0.9005 | 1195 | 1.1217 | 63986920 |
| 0.1155 | 0.9043 | 1200 | 1.1180 | 64260840 |
| 0.1181 | 0.9080 | 1205 | 1.1183 | 64524968 |
| 0.1302 | 0.9118 | 1210 | 1.1212 | 64794256 |
| 0.08 | 0.9156 | 1215 | 1.1216 | 65068040 |
| 0.1449 | 0.9193 | 1220 | 1.1182 | 65334568 |
| 0.1144 | 0.9231 | 1225 | 1.1191 | 65610360 |
| 0.1357 | 0.9269 | 1230 | 1.1197 | 65880032 |
| 0.1088 | 0.9306 | 1235 | 1.1171 | 66149696 |
| 0.1262 | 0.9344 | 1240 | 1.1178 | 66416168 |
| 0.1718 | 0.9382 | 1245 | 1.1190 | 66697408 |
| 0.2095 | 0.9419 | 1250 | 1.1164 | 66962696 |
| 0.2001 | 0.9457 | 1255 | 1.1174 | 67227976 |
| 0.2356 | 0.9495 | 1260 | 1.1182 | 67495688 |
| 0.146 | 0.9532 | 1265 | 1.1162 | 67771504 |
| 0.2264 | 0.9570 | 1270 | 1.1164 | 68045104 |
| 0.1425 | 0.9608 | 1275 | 1.1153 | 68314488 |
| 0.1339 | 0.9645 | 1280 | 1.1189 | 68579112 |
| 0.1162 | 0.9683 | 1285 | 1.1174 | 68840840 |
| 0.1216 | 0.9721 | 1290 | 1.1150 | 69107968 |
| 0.1872 | 0.9758 | 1295 | 1.1180 | 69376792 |
| 0.1574 | 0.9796 | 1300 | 1.1168 | 69652920 |
| 0.1585 | 0.9834 | 1305 | 1.1145 | 69922352 |
| 0.1555 | 0.9871 | 1310 | 1.1166 | 70191448 |
| 0.2285 | 0.9909 | 1315 | 1.1180 | 70458160 |
| 0.0577 | 0.9947 | 1320 | 1.1159 | 70725656 |
| 0.1695 | 0.9984 | 1325 | 1.1139 | 70985936 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1