--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1020 - Num Input Tokens Seen: 38487592 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6684 | 0.0071 | 5 | 1.3917 | 267136 | | 1.5973 | 0.0143 | 10 | 1.3492 | 527944 | | 1.4177 | 0.0214 | 15 | 1.2804 | 806832 | | 1.3667 | 0.0285 | 20 | 1.2239 | 1089448 | | 1.327 | 0.0356 | 25 | 1.1805 | 1360128 | | 1.1902 | 0.0428 | 30 | 1.1767 | 1645016 | | 1.1371 | 0.0499 | 35 | 1.1639 | 1922096 | | 1.0204 | 0.0570 | 40 | 1.1790 | 2195688 | | 0.8439 | 0.0642 | 45 | 1.1984 | 2468224 | | 0.8398 | 0.0713 | 50 | 1.2357 | 2748648 | | 0.6944 | 0.0784 | 55 | 1.2190 | 3026552 | | 0.6448 | 0.0855 | 60 | 1.2495 | 3301512 | | 0.674 | 0.0927 | 65 | 1.2314 | 3571048 | | 0.5917 | 0.0998 | 70 | 1.2129 | 3845032 | | 0.4513 | 0.1069 | 75 | 1.2212 | 4112448 | | 0.4732 | 0.1141 | 80 | 1.2010 | 4388696 | | 0.5147 | 0.1212 | 85 | 1.2146 | 4668664 | | 0.4466 | 0.1283 | 90 | 1.1984 | 4940912 | | 0.3307 | 0.1354 | 95 | 1.2064 | 5215928 | | 0.4373 | 0.1426 | 100 | 1.1983 | 5491272 | | 0.4091 | 0.1497 | 105 | 1.1922 | 5771016 | | 0.3565 | 0.1568 | 110 | 1.1836 | 6042648 | | 0.4144 | 0.1640 | 115 | 1.1901 | 6319168 | | 0.3271 | 0.1711 | 120 | 1.1863 | 6595848 | | 0.3036 | 0.1782 | 125 | 1.1822 | 6874424 | | 0.247 | 0.1854 | 130 | 1.1854 | 7150560 | | 0.2981 | 0.1925 | 135 | 1.1752 | 7427296 | | 0.2897 | 0.1996 | 140 | 1.1820 | 7699936 | | 0.3774 | 0.2067 | 145 | 1.1722 | 7974304 | | 0.2749 | 0.2139 | 150 | 1.1697 | 8243304 | | 0.1711 | 0.2210 | 155 | 1.1795 | 8514432 | | 0.3155 | 0.2281 | 160 | 1.1652 | 8786576 | | 0.2774 | 0.2353 | 165 | 1.1709 | 9067648 | | 0.3152 | 0.2424 | 170 | 1.1679 | 9337744 | | 0.3076 | 0.2495 | 175 | 1.1645 | 9614672 | | 0.2671 | 0.2566 | 180 | 1.1619 | 9891496 | | 0.2063 | 0.2638 | 185 | 1.1608 | 10166192 | | 0.1924 | 0.2709 | 190 | 1.1600 | 10441352 | | 0.2558 | 0.2780 | 195 | 1.1575 | 10718632 | | 0.2587 | 0.2852 | 200 | 1.1601 | 10990920 | | 0.3404 | 0.2923 | 205 | 1.1566 | 11267848 | | 0.2668 | 0.2994 | 210 | 1.1547 | 11541440 | | 0.2414 | 0.3065 | 215 | 1.1554 | 11815968 | | 0.2503 | 0.3137 | 220 | 1.1508 | 12086520 | | 0.2804 | 0.3208 | 225 | 1.1537 | 12362432 | | 0.2019 | 0.3279 | 230 | 1.1510 | 12629384 | | 0.2269 | 0.3351 | 235 | 1.1474 | 12906600 | | 0.2972 | 0.3422 | 240 | 1.1543 | 13182328 | | 0.1945 | 0.3493 | 245 | 1.1487 | 13454848 | | 0.2719 | 0.3564 | 250 | 1.1463 | 13725400 | | 0.3308 | 0.3636 | 255 | 1.1463 | 14002992 | | 0.2309 | 0.3707 | 260 | 1.1442 | 14273016 | | 0.2641 | 0.3778 | 265 | 1.1388 | 14546376 | | 0.2995 | 0.3850 | 270 | 1.1452 | 14822144 | | 0.2778 | 0.3921 | 275 | 1.1409 | 15099184 | | 0.2189 | 0.3992 | 280 | 1.1374 | 15377816 | | 0.2998 | 0.4063 | 285 | 1.1414 | 15651240 | | 0.3122 | 0.4135 | 290 | 1.1391 | 15922608 | | 0.3337 | 0.4206 | 295 | 1.1342 | 16193632 | | 0.2351 | 0.4277 | 300 | 1.1360 | 16469976 | | 0.2763 | 0.4349 | 305 | 1.1346 | 16740760 | | 0.3261 | 0.4420 | 310 | 1.1370 | 17015216 | | 0.2783 | 0.4491 | 315 | 1.1364 | 17289608 | | 0.2433 | 0.4562 | 320 | 1.1320 | 17557448 | | 0.2029 | 0.4634 | 325 | 1.1329 | 17828456 | | 0.2399 | 0.4705 | 330 | 1.1352 | 18104216 | | 0.2676 | 0.4776 | 335 | 1.1298 | 18376544 | | 0.2009 | 0.4848 | 340 | 1.1345 | 18650968 | | 0.3097 | 0.4919 | 345 | 1.1312 | 18928000 | | 0.2695 | 0.4990 | 350 | 1.1259 | 19197288 | | 0.2933 | 0.5061 | 355 | 1.1309 | 19474976 | | 0.2231 | 0.5133 | 360 | 1.1298 | 19761168 | | 0.3188 | 0.5204 | 365 | 1.1267 | 20035664 | | 0.2614 | 0.5275 | 370 | 1.1306 | 20311304 | | 0.2824 | 0.5347 | 375 | 1.1279 | 20587848 | | 0.2569 | 0.5418 | 380 | 1.1238 | 20863952 | | 0.2747 | 0.5489 | 385 | 1.1257 | 21149864 | | 0.258 | 0.5561 | 390 | 1.1274 | 21424128 | | 0.2175 | 0.5632 | 395 | 1.1243 | 21700024 | | 0.2213 | 0.5703 | 400 | 1.1246 | 21974976 | | 0.3015 | 0.5774 | 405 | 1.1230 | 22241808 | | 0.2435 | 0.5846 | 410 | 1.1218 | 22516720 | | 0.2905 | 0.5917 | 415 | 1.1241 | 22789008 | | 0.2361 | 0.5988 | 420 | 1.1221 | 23067672 | | 0.2975 | 0.6060 | 425 | 1.1212 | 23342176 | | 0.2594 | 0.6131 | 430 | 1.1214 | 23612040 | | 0.2303 | 0.6202 | 435 | 1.1207 | 23887616 | | 0.2454 | 0.6273 | 440 | 1.1195 | 24162232 | | 0.2677 | 0.6345 | 445 | 1.1196 | 24433008 | | 0.1848 | 0.6416 | 450 | 1.1196 | 24705832 | | 0.2359 | 0.6487 | 455 | 1.1208 | 24984040 | | 0.2962 | 0.6559 | 460 | 1.1212 | 25256024 | | 0.2943 | 0.6630 | 465 | 1.1179 | 25525664 | | 0.2482 | 0.6701 | 470 | 1.1191 | 25802976 | | 0.2206 | 0.6772 | 475 | 1.1156 | 26079952 | | 0.3008 | 0.6844 | 480 | 1.1175 | 26355712 | | 0.1662 | 0.6915 | 485 | 1.1171 | 26631360 | | 0.2349 | 0.6986 | 490 | 1.1161 | 26910880 | | 0.1984 | 0.7058 | 495 | 1.1152 | 27189568 | | 0.1594 | 0.7129 | 500 | 1.1176 | 27462312 | | 0.2599 | 0.7200 | 505 | 1.1168 | 27734488 | | 0.2337 | 0.7271 | 510 | 1.1125 | 28014184 | | 0.2884 | 0.7343 | 515 | 1.1154 | 28292584 | | 0.1878 | 0.7414 | 520 | 1.1138 | 28566848 | | 0.2564 | 0.7485 | 525 | 1.1124 | 28850664 | | 0.2353 | 0.7557 | 530 | 1.1127 | 29124184 | | 0.2854 | 0.7628 | 535 | 1.1136 | 29401408 | | 0.1839 | 0.7699 | 540 | 1.1118 | 29680840 | | 0.1636 | 0.7770 | 545 | 1.1113 | 29960360 | | 0.317 | 0.7842 | 550 | 1.1140 | 30233968 | | 0.267 | 0.7913 | 555 | 1.1101 | 30507104 | | 0.1583 | 0.7984 | 560 | 1.1127 | 30782136 | | 0.2464 | 0.8056 | 565 | 1.1143 | 31061608 | | 0.22 | 0.8127 | 570 | 1.1096 | 31333776 | | 0.211 | 0.8198 | 575 | 1.1095 | 31608144 | | 0.3073 | 0.8269 | 580 | 1.1112 | 31876368 | | 0.1747 | 0.8341 | 585 | 1.1084 | 32146688 | | 0.2157 | 0.8412 | 590 | 1.1102 | 32419328 | | 0.2618 | 0.8483 | 595 | 1.1089 | 32690328 | | 0.2084 | 0.8555 | 600 | 1.1064 | 32960256 | | 0.2344 | 0.8626 | 605 | 1.1063 | 33234896 | | 0.2234 | 0.8697 | 610 | 1.1096 | 33509632 | | 0.2156 | 0.8768 | 615 | 1.1068 | 33781672 | | 0.3154 | 0.8840 | 620 | 1.1046 | 34058936 | | 0.2087 | 0.8911 | 625 | 1.1089 | 34334296 | | 0.1694 | 0.8982 | 630 | 1.1063 | 34603152 | | 0.2507 | 0.9054 | 635 | 1.1040 | 34874256 | | 0.2275 | 0.9125 | 640 | 1.1057 | 35144432 | | 0.2456 | 0.9196 | 645 | 1.1060 | 35423104 | | 0.236 | 0.9268 | 650 | 1.1071 | 35688376 | | 0.2216 | 0.9339 | 655 | 1.1074 | 35964360 | | 0.2621 | 0.9410 | 660 | 1.1058 | 36242960 | | 0.2174 | 0.9481 | 665 | 1.1031 | 36512112 | | 0.2301 | 0.9553 | 670 | 1.1044 | 36780048 | | 0.2529 | 0.9624 | 675 | 1.1049 | 37051992 | | 0.2614 | 0.9695 | 680 | 1.1038 | 37328608 | | 0.2334 | 0.9767 | 685 | 1.1023 | 37609592 | | 0.1567 | 0.9838 | 690 | 1.1042 | 37882008 | | 0.2197 | 0.9909 | 695 | 1.1037 | 38152448 | | 0.2266 | 0.9980 | 700 | 1.1021 | 38431096 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1