Edit model card

collapse_gemma-2-2b_hs2_replace_iter10_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6482
  • Num Input Tokens Seen: 7747456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6727 0.0315 5 1.3102 246184
1.009 0.0630 10 1.2543 489448
0.667 0.0945 15 1.3838 726824
0.4294 0.1259 20 1.5708 970024
0.2248 0.1574 25 1.6675 1212248
0.1908 0.1889 30 1.8604 1467208
0.1451 0.2204 35 2.0014 1710152
0.0621 0.2519 40 2.1971 1954192
0.0381 0.2834 45 2.3063 2191576
0.0367 0.3148 50 2.3949 2433592
0.036 0.3463 55 2.4774 2679696
0.0294 0.3778 60 2.5588 2928552
0.0283 0.4093 65 2.5792 3173464
0.0285 0.4408 70 2.6130 3413776
0.0246 0.4723 75 2.6031 3659144
0.0239 0.5037 80 2.6188 3912088
0.023 0.5352 85 2.6231 4148400
0.0251 0.5667 90 2.5840 4398984
0.0236 0.5982 95 2.5662 4651040
0.0264 0.6297 100 2.5629 4894920
0.0243 0.6612 105 2.5727 5137152
0.0256 0.6926 110 2.5955 5378304
0.0235 0.7241 115 2.6078 5624672
0.0242 0.7556 120 2.6111 5877704
0.024 0.7871 125 2.6151 6124640
0.0265 0.8186 130 2.6286 6367576
0.0224 0.8501 135 2.6392 6614328
0.0242 0.8815 140 2.6356 6856504
0.023 0.9130 145 2.6439 7105832
0.0238 0.9445 150 2.6567 7354200
0.0244 0.9760 155 2.6456 7601504

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter10_sftsd2

Base model

google/gemma-2-2b
Finetuned
this model