collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1114
- Num Input Tokens Seen: 30129080
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6587 | 0.0092 | 5 | 1.3863 | 276696 |
1.5356 | 0.0185 | 10 | 1.3193 | 550488 |
1.4519 | 0.0277 | 15 | 1.2505 | 831264 |
1.3766 | 0.0369 | 20 | 1.1950 | 1107760 |
1.3159 | 0.0461 | 25 | 1.1635 | 1389896 |
1.1803 | 0.0554 | 30 | 1.1441 | 1668904 |
1.1339 | 0.0646 | 35 | 1.1371 | 1949864 |
0.9903 | 0.0738 | 40 | 1.1480 | 2233824 |
1.0179 | 0.0830 | 45 | 1.1589 | 2509936 |
0.9916 | 0.0923 | 50 | 1.1672 | 2793040 |
0.9015 | 0.1015 | 55 | 1.1839 | 3068680 |
0.8562 | 0.1107 | 60 | 1.1852 | 3347056 |
0.8485 | 0.1200 | 65 | 1.1998 | 3627512 |
0.7508 | 0.1292 | 70 | 1.2026 | 3905976 |
0.7357 | 0.1384 | 75 | 1.2045 | 4179328 |
0.6496 | 0.1476 | 80 | 1.1934 | 4453160 |
0.7891 | 0.1569 | 85 | 1.1950 | 4735096 |
0.5708 | 0.1661 | 90 | 1.1959 | 5015384 |
0.607 | 0.1753 | 95 | 1.2026 | 5284160 |
0.5427 | 0.1845 | 100 | 1.1955 | 5555648 |
0.4434 | 0.1938 | 105 | 1.1935 | 5839504 |
0.4716 | 0.2030 | 110 | 1.1997 | 6113904 |
0.5612 | 0.2122 | 115 | 1.1869 | 6394080 |
0.5522 | 0.2215 | 120 | 1.1934 | 6668968 |
0.4752 | 0.2307 | 125 | 1.1917 | 6943216 |
0.3948 | 0.2399 | 130 | 1.1873 | 7224944 |
0.4525 | 0.2491 | 135 | 1.1890 | 7499080 |
0.5147 | 0.2584 | 140 | 1.1814 | 7773104 |
0.4881 | 0.2676 | 145 | 1.1917 | 8050400 |
0.3915 | 0.2768 | 150 | 1.1842 | 8332168 |
0.4032 | 0.2860 | 155 | 1.1897 | 8608296 |
0.4227 | 0.2953 | 160 | 1.1804 | 8887936 |
0.4128 | 0.3045 | 165 | 1.1838 | 9164856 |
0.4097 | 0.3137 | 170 | 1.1759 | 9448376 |
0.3663 | 0.3230 | 175 | 1.1841 | 9721256 |
0.4311 | 0.3322 | 180 | 1.1780 | 9999808 |
0.3765 | 0.3414 | 185 | 1.1763 | 10273504 |
0.4953 | 0.3506 | 190 | 1.1663 | 10554248 |
0.3491 | 0.3599 | 195 | 1.1760 | 10835664 |
0.5705 | 0.3691 | 200 | 1.1670 | 11117936 |
0.3433 | 0.3783 | 205 | 1.1677 | 11394272 |
0.366 | 0.3875 | 210 | 1.1675 | 11667112 |
0.3678 | 0.3968 | 215 | 1.1643 | 11940344 |
0.3999 | 0.4060 | 220 | 1.1664 | 12226416 |
0.2779 | 0.4152 | 225 | 1.1623 | 12509896 |
0.2937 | 0.4245 | 230 | 1.1625 | 12789696 |
0.3232 | 0.4337 | 235 | 1.1577 | 13067376 |
0.2727 | 0.4429 | 240 | 1.1603 | 13347168 |
0.4066 | 0.4521 | 245 | 1.1549 | 13623832 |
0.3169 | 0.4614 | 250 | 1.1554 | 13902696 |
0.3345 | 0.4706 | 255 | 1.1557 | 14188000 |
0.3015 | 0.4798 | 260 | 1.1543 | 14470712 |
0.3465 | 0.4890 | 265 | 1.1519 | 14746408 |
0.3225 | 0.4983 | 270 | 1.1479 | 15018640 |
0.2737 | 0.5075 | 275 | 1.1483 | 15296928 |
0.3426 | 0.5167 | 280 | 1.1429 | 15574408 |
0.3332 | 0.5260 | 285 | 1.1446 | 15847000 |
0.2775 | 0.5352 | 290 | 1.1413 | 16126256 |
0.3818 | 0.5444 | 295 | 1.1398 | 16403872 |
0.402 | 0.5536 | 300 | 1.1409 | 16683200 |
0.3527 | 0.5629 | 305 | 1.1387 | 16957856 |
0.3747 | 0.5721 | 310 | 1.1381 | 17237088 |
0.2767 | 0.5813 | 315 | 1.1398 | 17514672 |
0.397 | 0.5905 | 320 | 1.1353 | 17790912 |
0.2713 | 0.5998 | 325 | 1.1355 | 18067224 |
0.3836 | 0.6090 | 330 | 1.1335 | 18345448 |
0.2953 | 0.6182 | 335 | 1.1340 | 18625288 |
0.3032 | 0.6275 | 340 | 1.1339 | 18895360 |
0.3337 | 0.6367 | 345 | 1.1315 | 19176592 |
0.2324 | 0.6459 | 350 | 1.1368 | 19456384 |
0.3954 | 0.6551 | 355 | 1.1290 | 19736048 |
0.3867 | 0.6644 | 360 | 1.1316 | 20017992 |
0.2376 | 0.6736 | 365 | 1.1317 | 20299128 |
0.2497 | 0.6828 | 370 | 1.1302 | 20572064 |
0.2433 | 0.6920 | 375 | 1.1295 | 20847344 |
0.3257 | 0.7013 | 380 | 1.1262 | 21131912 |
0.3596 | 0.7105 | 385 | 1.1299 | 21410128 |
0.3307 | 0.7197 | 390 | 1.1261 | 21691144 |
0.3911 | 0.7290 | 395 | 1.1277 | 21972080 |
0.3247 | 0.7382 | 400 | 1.1245 | 22254672 |
0.3654 | 0.7474 | 405 | 1.1262 | 22539544 |
0.2657 | 0.7566 | 410 | 1.1235 | 22820048 |
0.3721 | 0.7659 | 415 | 1.1242 | 23096928 |
0.2776 | 0.7751 | 420 | 1.1227 | 23369624 |
0.2669 | 0.7843 | 425 | 1.1249 | 23652232 |
0.3584 | 0.7935 | 430 | 1.1227 | 23931024 |
0.4058 | 0.8028 | 435 | 1.1194 | 24211728 |
0.271 | 0.8120 | 440 | 1.1246 | 24490376 |
0.2958 | 0.8212 | 445 | 1.1206 | 24772424 |
0.2507 | 0.8304 | 450 | 1.1214 | 25054744 |
0.3209 | 0.8397 | 455 | 1.1193 | 25331320 |
0.2983 | 0.8489 | 460 | 1.1173 | 25606720 |
0.302 | 0.8581 | 465 | 1.1181 | 25890600 |
0.4136 | 0.8674 | 470 | 1.1165 | 26167160 |
0.3069 | 0.8766 | 475 | 1.1179 | 26448160 |
0.2351 | 0.8858 | 480 | 1.1173 | 26723544 |
0.2373 | 0.8950 | 485 | 1.1175 | 27006408 |
0.3894 | 0.9043 | 490 | 1.1146 | 27281088 |
0.277 | 0.9135 | 495 | 1.1174 | 27562296 |
0.3009 | 0.9227 | 500 | 1.1151 | 27833952 |
0.3229 | 0.9319 | 505 | 1.1139 | 28106704 |
0.2891 | 0.9412 | 510 | 1.1161 | 28385768 |
0.2745 | 0.9504 | 515 | 1.1128 | 28670136 |
0.3377 | 0.9596 | 520 | 1.1158 | 28953688 |
0.3045 | 0.9689 | 525 | 1.1126 | 29230304 |
0.2475 | 0.9781 | 530 | 1.1150 | 29509224 |
0.2633 | 0.9873 | 535 | 1.1121 | 29791512 |
0.2622 | 0.9965 | 540 | 1.1105 | 30074936 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0
Base model
google/gemma-2-2b