jkazdan's picture
End of training
8faee03 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter7_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5982
  • Num Input Tokens Seen: 8186704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5901 0.0315 5 1.3093 247976
1.1237 0.0630 10 1.2401 505504
0.7826 0.0945 15 1.3165 763072
0.5963 0.1259 20 1.4831 1022728
0.382 0.1574 25 1.6187 1285000
0.1789 0.1889 30 1.7936 1548016
0.0997 0.2204 35 1.9666 1810296
0.0771 0.2519 40 2.1144 2073776
0.0761 0.2834 45 2.2913 2332728
0.0391 0.3148 50 2.3861 2581064
0.039 0.3463 55 2.4478 2846560
0.0371 0.3778 60 2.4824 3112600
0.0303 0.4093 65 2.5257 3369440
0.0292 0.4408 70 2.5481 3635616
0.0254 0.4723 75 2.5834 3900456
0.0275 0.5037 80 2.6008 4152824
0.0304 0.5352 85 2.6085 4412552
0.03 0.5667 90 2.5923 4674816
0.026 0.5982 95 2.5989 4933192
0.0247 0.6297 100 2.5903 5189528
0.0267 0.6612 105 2.5938 5452936
0.0252 0.6926 110 2.6102 5704000
0.03 0.7241 115 2.6064 5962520
0.0242 0.7556 120 2.5853 6217880
0.0231 0.7871 125 2.5819 6473072
0.0259 0.8186 130 2.5811 6731432
0.024 0.8501 135 2.5730 6995184
0.025 0.8815 140 2.5772 7252184
0.0261 0.9130 145 2.5934 7508144
0.0252 0.9445 150 2.5982 7769400
0.0309 0.9760 155 2.5964 8032040

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1