jkazdan's picture
End of training
319d00a verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1156
- Num Input Tokens Seen: 38394432
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log | 0 | 0 | 1.3956 | 0 |
| 1.64 | 0.0071 | 5 | 1.3915 | 282928 |
| 1.717 | 0.0142 | 10 | 1.3495 | 547680 |
| 1.4756 | 0.0214 | 15 | 1.2809 | 819464 |
| 1.3413 | 0.0285 | 20 | 1.2255 | 1088176 |
| 1.2434 | 0.0356 | 25 | 1.1810 | 1359440 |
| 1.2176 | 0.0427 | 30 | 1.1672 | 1625784 |
| 1.2541 | 0.0499 | 35 | 1.1491 | 1899896 |
| 0.9819 | 0.0570 | 40 | 1.1533 | 2176760 |
| 0.947 | 0.0641 | 45 | 1.1622 | 2458784 |
| 0.8886 | 0.0712 | 50 | 1.1769 | 2731336 |
| 0.7859 | 0.0784 | 55 | 1.2131 | 3004608 |
| 0.7724 | 0.0855 | 60 | 1.2111 | 3276648 |
| 0.8257 | 0.0926 | 65 | 1.2124 | 3552744 |
| 0.7196 | 0.0997 | 70 | 1.2153 | 3828616 |
| 0.7089 | 0.1068 | 75 | 1.2123 | 4108840 |
| 0.7354 | 0.1140 | 80 | 1.2026 | 4391920 |
| 0.6275 | 0.1211 | 85 | 1.2205 | 4674200 |
| 0.5129 | 0.1282 | 90 | 1.2144 | 4945712 |
| 0.4506 | 0.1353 | 95 | 1.2009 | 5214520 |
| 0.5107 | 0.1425 | 100 | 1.2186 | 5484592 |
| 0.4638 | 0.1496 | 105 | 1.2054 | 5752320 |
| 0.4786 | 0.1567 | 110 | 1.2011 | 6028136 |
| 0.5751 | 0.1638 | 115 | 1.2009 | 6304032 |
| 0.4034 | 0.1710 | 120 | 1.2037 | 6579840 |
| 0.3894 | 0.1781 | 125 | 1.1952 | 6855056 |
| 0.4096 | 0.1852 | 130 | 1.1990 | 7132912 |
| 0.486 | 0.1923 | 135 | 1.1961 | 7401704 |
| 0.3722 | 0.1994 | 140 | 1.1943 | 7674144 |
| 0.3758 | 0.2066 | 145 | 1.1971 | 7955296 |
| 0.3871 | 0.2137 | 150 | 1.1955 | 8232712 |
| 0.3788 | 0.2208 | 155 | 1.1905 | 8504176 |
| 0.3235 | 0.2279 | 160 | 1.1879 | 8779072 |
| 0.3315 | 0.2351 | 165 | 1.1902 | 9059672 |
| 0.328 | 0.2422 | 170 | 1.1905 | 9336368 |
| 0.3476 | 0.2493 | 175 | 1.1880 | 9601712 |
| 0.2789 | 0.2564 | 180 | 1.1829 | 9871144 |
| 0.2937 | 0.2636 | 185 | 1.1835 | 10137584 |
| 0.3359 | 0.2707 | 190 | 1.1815 | 10406656 |
| 0.3616 | 0.2778 | 195 | 1.1803 | 10677608 |
| 0.3162 | 0.2849 | 200 | 1.1794 | 10948264 |
| 0.3174 | 0.2920 | 205 | 1.1750 | 11218000 |
| 0.2904 | 0.2992 | 210 | 1.1806 | 11498160 |
| 0.3929 | 0.3063 | 215 | 1.1692 | 11779608 |
| 0.2965 | 0.3134 | 220 | 1.1731 | 12049808 |
| 0.4205 | 0.3205 | 225 | 1.1692 | 12326136 |
| 0.2849 | 0.3277 | 230 | 1.1736 | 12596680 |
| 0.3107 | 0.3348 | 235 | 1.1665 | 12869960 |
| 0.2267 | 0.3419 | 240 | 1.1724 | 13145648 |
| 0.2392 | 0.3490 | 245 | 1.1708 | 13415312 |
| 0.1885 | 0.3562 | 250 | 1.1657 | 13690584 |
| 0.2722 | 0.3633 | 255 | 1.1676 | 13968448 |
| 0.2161 | 0.3704 | 260 | 1.1651 | 14239944 |
| 0.1734 | 0.3775 | 265 | 1.1659 | 14510952 |
| 0.3554 | 0.3846 | 270 | 1.1580 | 14780912 |
| 0.316 | 0.3918 | 275 | 1.1608 | 15055568 |
| 0.2742 | 0.3989 | 280 | 1.1562 | 15334424 |
| 0.1887 | 0.4060 | 285 | 1.1580 | 15606264 |
| 0.3007 | 0.4131 | 290 | 1.1570 | 15876168 |
| 0.1913 | 0.4203 | 295 | 1.1507 | 16146352 |
| 0.2763 | 0.4274 | 300 | 1.1523 | 16420864 |
| 0.3037 | 0.4345 | 305 | 1.1499 | 16693096 |
| 0.1839 | 0.4416 | 310 | 1.1526 | 16976408 |
| 0.2314 | 0.4488 | 315 | 1.1499 | 17252728 |
| 0.2425 | 0.4559 | 320 | 1.1526 | 17521216 |
| 0.2362 | 0.4630 | 325 | 1.1487 | 17788696 |
| 0.2139 | 0.4701 | 330 | 1.1502 | 18057744 |
| 0.2801 | 0.4773 | 335 | 1.1443 | 18332304 |
| 0.3707 | 0.4844 | 340 | 1.1458 | 18610592 |
| 0.2548 | 0.4915 | 345 | 1.1450 | 18881784 |
| 0.2455 | 0.4986 | 350 | 1.1418 | 19146128 |
| 0.2278 | 0.5057 | 355 | 1.1452 | 19420384 |
| 0.2771 | 0.5129 | 360 | 1.1420 | 19696584 |
| 0.2731 | 0.5200 | 365 | 1.1394 | 19967720 |
| 0.219 | 0.5271 | 370 | 1.1415 | 20241272 |
| 0.2432 | 0.5342 | 375 | 1.1457 | 20514896 |
| 0.1841 | 0.5414 | 380 | 1.1429 | 20779312 |
| 0.2617 | 0.5485 | 385 | 1.1404 | 21056016 |
| 0.2928 | 0.5556 | 390 | 1.1404 | 21327080 |
| 0.1952 | 0.5627 | 395 | 1.1354 | 21598992 |
| 0.227 | 0.5699 | 400 | 1.1381 | 21877208 |
| 0.2218 | 0.5770 | 405 | 1.1380 | 22149176 |
| 0.1683 | 0.5841 | 410 | 1.1375 | 22423056 |
| 0.3227 | 0.5912 | 415 | 1.1348 | 22693424 |
| 0.3058 | 0.5983 | 420 | 1.1357 | 22966920 |
| 0.1881 | 0.6055 | 425 | 1.1341 | 23246936 |
| 0.2359 | 0.6126 | 430 | 1.1314 | 23522192 |
| 0.2074 | 0.6197 | 435 | 1.1307 | 23801944 |
| 0.2584 | 0.6268 | 440 | 1.1328 | 24074328 |
| 0.2027 | 0.6340 | 445 | 1.1289 | 24348328 |
| 0.2897 | 0.6411 | 450 | 1.1305 | 24623816 |
| 0.2167 | 0.6482 | 455 | 1.1309 | 24902928 |
| 0.3028 | 0.6553 | 460 | 1.1306 | 25174984 |
| 0.2939 | 0.6625 | 465 | 1.1287 | 25447728 |
| 0.2679 | 0.6696 | 470 | 1.1262 | 25716008 |
| 0.3617 | 0.6767 | 475 | 1.1275 | 25994912 |
| 0.3261 | 0.6838 | 480 | 1.1266 | 26270048 |
| 0.2113 | 0.6909 | 485 | 1.1270 | 26541616 |
| 0.3059 | 0.6981 | 490 | 1.1287 | 26818200 |
| 0.2356 | 0.7052 | 495 | 1.1242 | 27087272 |
| 0.2931 | 0.7123 | 500 | 1.1246 | 27359208 |
| 0.2421 | 0.7194 | 505 | 1.1233 | 27638688 |
| 0.2792 | 0.7266 | 510 | 1.1252 | 27911800 |
| 0.2415 | 0.7337 | 515 | 1.1214 | 28186904 |
| 0.292 | 0.7408 | 520 | 1.1222 | 28462520 |
| 0.2697 | 0.7479 | 525 | 1.1214 | 28740360 |
| 0.2745 | 0.7551 | 530 | 1.1196 | 29013592 |
| 0.2365 | 0.7622 | 535 | 1.1221 | 29285096 |
| 0.2456 | 0.7693 | 540 | 1.1199 | 29557536 |
| 0.2182 | 0.7764 | 545 | 1.1208 | 29835096 |
| 0.3136 | 0.7835 | 550 | 1.1219 | 30112088 |
| 0.184 | 0.7907 | 555 | 1.1167 | 30387312 |
| 0.2508 | 0.7978 | 560 | 1.1200 | 30659104 |
| 0.2854 | 0.8049 | 565 | 1.1208 | 30939024 |
| 0.2423 | 0.8120 | 570 | 1.1186 | 31214856 |
| 0.3061 | 0.8192 | 575 | 1.1174 | 31487176 |
| 0.2599 | 0.8263 | 580 | 1.1176 | 31758936 |
| 0.1641 | 0.8334 | 585 | 1.1192 | 32029768 |
| 0.3293 | 0.8405 | 590 | 1.1180 | 32306824 |
| 0.1687 | 0.8477 | 595 | 1.1187 | 32583424 |
| 0.2466 | 0.8548 | 600 | 1.1157 | 32855528 |
| 0.2684 | 0.8619 | 605 | 1.1151 | 33131344 |
| 0.2623 | 0.8690 | 610 | 1.1156 | 33412888 |
| 0.3949 | 0.8761 | 615 | 1.1167 | 33688992 |
| 0.2317 | 0.8833 | 620 | 1.1167 | 33963096 |
| 0.2483 | 0.8904 | 625 | 1.1147 | 34243336 |
| 0.3731 | 0.8975 | 630 | 1.1142 | 34521472 |
| 0.2577 | 0.9046 | 635 | 1.1143 | 34794832 |
| 0.2225 | 0.9118 | 640 | 1.1139 | 35064072 |
| 0.1567 | 0.9189 | 645 | 1.1146 | 35342008 |
| 0.3207 | 0.9260 | 650 | 1.1146 | 35610720 |
| 0.1626 | 0.9331 | 655 | 1.1153 | 35880752 |
| 0.2122 | 0.9403 | 660 | 1.1138 | 36156864 |
| 0.2865 | 0.9474 | 665 | 1.1110 | 36433816 |
| 0.2319 | 0.9545 | 670 | 1.1134 | 36713952 |
| 0.1696 | 0.9616 | 675 | 1.1129 | 36980552 |
| 0.2326 | 0.9687 | 680 | 1.1120 | 37256536 |
| 0.2783 | 0.9759 | 685 | 1.1133 | 37524184 |
| 0.2046 | 0.9830 | 690 | 1.1113 | 37805352 |
| 0.2798 | 0.9901 | 695 | 1.1119 | 38079104 |
| 0.2794 | 0.9972 | 700 | 1.1159 | 38340280 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1