--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1114 - Num Input Tokens Seen: 30129080 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6587 | 0.0092 | 5 | 1.3863 | 276696 | | 1.5356 | 0.0185 | 10 | 1.3193 | 550488 | | 1.4519 | 0.0277 | 15 | 1.2505 | 831264 | | 1.3766 | 0.0369 | 20 | 1.1950 | 1107760 | | 1.3159 | 0.0461 | 25 | 1.1635 | 1389896 | | 1.1803 | 0.0554 | 30 | 1.1441 | 1668904 | | 1.1339 | 0.0646 | 35 | 1.1371 | 1949864 | | 0.9903 | 0.0738 | 40 | 1.1480 | 2233824 | | 1.0179 | 0.0830 | 45 | 1.1589 | 2509936 | | 0.9916 | 0.0923 | 50 | 1.1672 | 2793040 | | 0.9015 | 0.1015 | 55 | 1.1839 | 3068680 | | 0.8562 | 0.1107 | 60 | 1.1852 | 3347056 | | 0.8485 | 0.1200 | 65 | 1.1998 | 3627512 | | 0.7508 | 0.1292 | 70 | 1.2026 | 3905976 | | 0.7357 | 0.1384 | 75 | 1.2045 | 4179328 | | 0.6496 | 0.1476 | 80 | 1.1934 | 4453160 | | 0.7891 | 0.1569 | 85 | 1.1950 | 4735096 | | 0.5708 | 0.1661 | 90 | 1.1959 | 5015384 | | 0.607 | 0.1753 | 95 | 1.2026 | 5284160 | | 0.5427 | 0.1845 | 100 | 1.1955 | 5555648 | | 0.4434 | 0.1938 | 105 | 1.1935 | 5839504 | | 0.4716 | 0.2030 | 110 | 1.1997 | 6113904 | | 0.5612 | 0.2122 | 115 | 1.1869 | 6394080 | | 0.5522 | 0.2215 | 120 | 1.1934 | 6668968 | | 0.4752 | 0.2307 | 125 | 1.1917 | 6943216 | | 0.3948 | 0.2399 | 130 | 1.1873 | 7224944 | | 0.4525 | 0.2491 | 135 | 1.1890 | 7499080 | | 0.5147 | 0.2584 | 140 | 1.1814 | 7773104 | | 0.4881 | 0.2676 | 145 | 1.1917 | 8050400 | | 0.3915 | 0.2768 | 150 | 1.1842 | 8332168 | | 0.4032 | 0.2860 | 155 | 1.1897 | 8608296 | | 0.4227 | 0.2953 | 160 | 1.1804 | 8887936 | | 0.4128 | 0.3045 | 165 | 1.1838 | 9164856 | | 0.4097 | 0.3137 | 170 | 1.1759 | 9448376 | | 0.3663 | 0.3230 | 175 | 1.1841 | 9721256 | | 0.4311 | 0.3322 | 180 | 1.1780 | 9999808 | | 0.3765 | 0.3414 | 185 | 1.1763 | 10273504 | | 0.4953 | 0.3506 | 190 | 1.1663 | 10554248 | | 0.3491 | 0.3599 | 195 | 1.1760 | 10835664 | | 0.5705 | 0.3691 | 200 | 1.1670 | 11117936 | | 0.3433 | 0.3783 | 205 | 1.1677 | 11394272 | | 0.366 | 0.3875 | 210 | 1.1675 | 11667112 | | 0.3678 | 0.3968 | 215 | 1.1643 | 11940344 | | 0.3999 | 0.4060 | 220 | 1.1664 | 12226416 | | 0.2779 | 0.4152 | 225 | 1.1623 | 12509896 | | 0.2937 | 0.4245 | 230 | 1.1625 | 12789696 | | 0.3232 | 0.4337 | 235 | 1.1577 | 13067376 | | 0.2727 | 0.4429 | 240 | 1.1603 | 13347168 | | 0.4066 | 0.4521 | 245 | 1.1549 | 13623832 | | 0.3169 | 0.4614 | 250 | 1.1554 | 13902696 | | 0.3345 | 0.4706 | 255 | 1.1557 | 14188000 | | 0.3015 | 0.4798 | 260 | 1.1543 | 14470712 | | 0.3465 | 0.4890 | 265 | 1.1519 | 14746408 | | 0.3225 | 0.4983 | 270 | 1.1479 | 15018640 | | 0.2737 | 0.5075 | 275 | 1.1483 | 15296928 | | 0.3426 | 0.5167 | 280 | 1.1429 | 15574408 | | 0.3332 | 0.5260 | 285 | 1.1446 | 15847000 | | 0.2775 | 0.5352 | 290 | 1.1413 | 16126256 | | 0.3818 | 0.5444 | 295 | 1.1398 | 16403872 | | 0.402 | 0.5536 | 300 | 1.1409 | 16683200 | | 0.3527 | 0.5629 | 305 | 1.1387 | 16957856 | | 0.3747 | 0.5721 | 310 | 1.1381 | 17237088 | | 0.2767 | 0.5813 | 315 | 1.1398 | 17514672 | | 0.397 | 0.5905 | 320 | 1.1353 | 17790912 | | 0.2713 | 0.5998 | 325 | 1.1355 | 18067224 | | 0.3836 | 0.6090 | 330 | 1.1335 | 18345448 | | 0.2953 | 0.6182 | 335 | 1.1340 | 18625288 | | 0.3032 | 0.6275 | 340 | 1.1339 | 18895360 | | 0.3337 | 0.6367 | 345 | 1.1315 | 19176592 | | 0.2324 | 0.6459 | 350 | 1.1368 | 19456384 | | 0.3954 | 0.6551 | 355 | 1.1290 | 19736048 | | 0.3867 | 0.6644 | 360 | 1.1316 | 20017992 | | 0.2376 | 0.6736 | 365 | 1.1317 | 20299128 | | 0.2497 | 0.6828 | 370 | 1.1302 | 20572064 | | 0.2433 | 0.6920 | 375 | 1.1295 | 20847344 | | 0.3257 | 0.7013 | 380 | 1.1262 | 21131912 | | 0.3596 | 0.7105 | 385 | 1.1299 | 21410128 | | 0.3307 | 0.7197 | 390 | 1.1261 | 21691144 | | 0.3911 | 0.7290 | 395 | 1.1277 | 21972080 | | 0.3247 | 0.7382 | 400 | 1.1245 | 22254672 | | 0.3654 | 0.7474 | 405 | 1.1262 | 22539544 | | 0.2657 | 0.7566 | 410 | 1.1235 | 22820048 | | 0.3721 | 0.7659 | 415 | 1.1242 | 23096928 | | 0.2776 | 0.7751 | 420 | 1.1227 | 23369624 | | 0.2669 | 0.7843 | 425 | 1.1249 | 23652232 | | 0.3584 | 0.7935 | 430 | 1.1227 | 23931024 | | 0.4058 | 0.8028 | 435 | 1.1194 | 24211728 | | 0.271 | 0.8120 | 440 | 1.1246 | 24490376 | | 0.2958 | 0.8212 | 445 | 1.1206 | 24772424 | | 0.2507 | 0.8304 | 450 | 1.1214 | 25054744 | | 0.3209 | 0.8397 | 455 | 1.1193 | 25331320 | | 0.2983 | 0.8489 | 460 | 1.1173 | 25606720 | | 0.302 | 0.8581 | 465 | 1.1181 | 25890600 | | 0.4136 | 0.8674 | 470 | 1.1165 | 26167160 | | 0.3069 | 0.8766 | 475 | 1.1179 | 26448160 | | 0.2351 | 0.8858 | 480 | 1.1173 | 26723544 | | 0.2373 | 0.8950 | 485 | 1.1175 | 27006408 | | 0.3894 | 0.9043 | 490 | 1.1146 | 27281088 | | 0.277 | 0.9135 | 495 | 1.1174 | 27562296 | | 0.3009 | 0.9227 | 500 | 1.1151 | 27833952 | | 0.3229 | 0.9319 | 505 | 1.1139 | 28106704 | | 0.2891 | 0.9412 | 510 | 1.1161 | 28385768 | | 0.2745 | 0.9504 | 515 | 1.1128 | 28670136 | | 0.3377 | 0.9596 | 520 | 1.1158 | 28953688 | | 0.3045 | 0.9689 | 525 | 1.1126 | 29230304 | | 0.2475 | 0.9781 | 530 | 1.1150 | 29509224 | | 0.2633 | 0.9873 | 535 | 1.1121 | 29791512 | | 0.2622 | 0.9965 | 540 | 1.1105 | 30074936 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1