Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1114
  • Num Input Tokens Seen: 30129080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6587 0.0092 5 1.3863 276696
1.5356 0.0185 10 1.3193 550488
1.4519 0.0277 15 1.2505 831264
1.3766 0.0369 20 1.1950 1107760
1.3159 0.0461 25 1.1635 1389896
1.1803 0.0554 30 1.1441 1668904
1.1339 0.0646 35 1.1371 1949864
0.9903 0.0738 40 1.1480 2233824
1.0179 0.0830 45 1.1589 2509936
0.9916 0.0923 50 1.1672 2793040
0.9015 0.1015 55 1.1839 3068680
0.8562 0.1107 60 1.1852 3347056
0.8485 0.1200 65 1.1998 3627512
0.7508 0.1292 70 1.2026 3905976
0.7357 0.1384 75 1.2045 4179328
0.6496 0.1476 80 1.1934 4453160
0.7891 0.1569 85 1.1950 4735096
0.5708 0.1661 90 1.1959 5015384
0.607 0.1753 95 1.2026 5284160
0.5427 0.1845 100 1.1955 5555648
0.4434 0.1938 105 1.1935 5839504
0.4716 0.2030 110 1.1997 6113904
0.5612 0.2122 115 1.1869 6394080
0.5522 0.2215 120 1.1934 6668968
0.4752 0.2307 125 1.1917 6943216
0.3948 0.2399 130 1.1873 7224944
0.4525 0.2491 135 1.1890 7499080
0.5147 0.2584 140 1.1814 7773104
0.4881 0.2676 145 1.1917 8050400
0.3915 0.2768 150 1.1842 8332168
0.4032 0.2860 155 1.1897 8608296
0.4227 0.2953 160 1.1804 8887936
0.4128 0.3045 165 1.1838 9164856
0.4097 0.3137 170 1.1759 9448376
0.3663 0.3230 175 1.1841 9721256
0.4311 0.3322 180 1.1780 9999808
0.3765 0.3414 185 1.1763 10273504
0.4953 0.3506 190 1.1663 10554248
0.3491 0.3599 195 1.1760 10835664
0.5705 0.3691 200 1.1670 11117936
0.3433 0.3783 205 1.1677 11394272
0.366 0.3875 210 1.1675 11667112
0.3678 0.3968 215 1.1643 11940344
0.3999 0.4060 220 1.1664 12226416
0.2779 0.4152 225 1.1623 12509896
0.2937 0.4245 230 1.1625 12789696
0.3232 0.4337 235 1.1577 13067376
0.2727 0.4429 240 1.1603 13347168
0.4066 0.4521 245 1.1549 13623832
0.3169 0.4614 250 1.1554 13902696
0.3345 0.4706 255 1.1557 14188000
0.3015 0.4798 260 1.1543 14470712
0.3465 0.4890 265 1.1519 14746408
0.3225 0.4983 270 1.1479 15018640
0.2737 0.5075 275 1.1483 15296928
0.3426 0.5167 280 1.1429 15574408
0.3332 0.5260 285 1.1446 15847000
0.2775 0.5352 290 1.1413 16126256
0.3818 0.5444 295 1.1398 16403872
0.402 0.5536 300 1.1409 16683200
0.3527 0.5629 305 1.1387 16957856
0.3747 0.5721 310 1.1381 17237088
0.2767 0.5813 315 1.1398 17514672
0.397 0.5905 320 1.1353 17790912
0.2713 0.5998 325 1.1355 18067224
0.3836 0.6090 330 1.1335 18345448
0.2953 0.6182 335 1.1340 18625288
0.3032 0.6275 340 1.1339 18895360
0.3337 0.6367 345 1.1315 19176592
0.2324 0.6459 350 1.1368 19456384
0.3954 0.6551 355 1.1290 19736048
0.3867 0.6644 360 1.1316 20017992
0.2376 0.6736 365 1.1317 20299128
0.2497 0.6828 370 1.1302 20572064
0.2433 0.6920 375 1.1295 20847344
0.3257 0.7013 380 1.1262 21131912
0.3596 0.7105 385 1.1299 21410128
0.3307 0.7197 390 1.1261 21691144
0.3911 0.7290 395 1.1277 21972080
0.3247 0.7382 400 1.1245 22254672
0.3654 0.7474 405 1.1262 22539544
0.2657 0.7566 410 1.1235 22820048
0.3721 0.7659 415 1.1242 23096928
0.2776 0.7751 420 1.1227 23369624
0.2669 0.7843 425 1.1249 23652232
0.3584 0.7935 430 1.1227 23931024
0.4058 0.8028 435 1.1194 24211728
0.271 0.8120 440 1.1246 24490376
0.2958 0.8212 445 1.1206 24772424
0.2507 0.8304 450 1.1214 25054744
0.3209 0.8397 455 1.1193 25331320
0.2983 0.8489 460 1.1173 25606720
0.302 0.8581 465 1.1181 25890600
0.4136 0.8674 470 1.1165 26167160
0.3069 0.8766 475 1.1179 26448160
0.2351 0.8858 480 1.1173 26723544
0.2373 0.8950 485 1.1175 27006408
0.3894 0.9043 490 1.1146 27281088
0.277 0.9135 495 1.1174 27562296
0.3009 0.9227 500 1.1151 27833952
0.3229 0.9319 505 1.1139 28106704
0.2891 0.9412 510 1.1161 28385768
0.2745 0.9504 515 1.1128 28670136
0.3377 0.9596 520 1.1158 28953688
0.3045 0.9689 525 1.1126 29230304
0.2475 0.9781 530 1.1150 29509224
0.2633 0.9873 535 1.1121 29791512
0.2622 0.9965 540 1.1105 30074936

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Base model

google/gemma-2-2b
Finetuned
(419)
this model