jkazdan's picture
End of training
e694734 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter8_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter8_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5431
  • Num Input Tokens Seen: 7729792

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6775 0.0317 5 1.3078 249640
1.1595 0.0633 10 1.2385 493152
0.7855 0.0950 15 1.3530 742568
0.4733 0.1267 20 1.5512 993096
0.3632 0.1584 25 1.6760 1241784
0.2405 0.1900 30 1.8061 1488408
0.1741 0.2217 35 2.0582 1736448
0.1075 0.2534 40 2.1692 1986304
0.054 0.2850 45 2.3143 2236192
0.0417 0.3167 50 2.4076 2485904
0.0317 0.3484 55 2.4662 2728784
0.0307 0.3800 60 2.4682 2981024
0.0281 0.4117 65 2.4533 3224056
0.0296 0.4434 70 2.4544 3466368
0.0281 0.4751 75 2.4627 3716376
0.029 0.5067 80 2.4801 3956352
0.0276 0.5384 85 2.5255 4200848
0.0266 0.5701 90 2.5232 4438112
0.0266 0.6017 95 2.5296 4681008
0.0271 0.6334 100 2.5345 4929272
0.026 0.6651 105 2.5361 5177360
0.0269 0.6968 110 2.5346 5422624
0.031 0.7284 115 2.5489 5664432
0.0277 0.7601 120 2.5383 5911992
0.0269 0.7918 125 2.5107 6157840
0.0245 0.8234 130 2.5145 6401608
0.0284 0.8551 135 2.5238 6639192
0.0269 0.8868 140 2.5150 6886080
0.0276 0.9184 145 2.5151 7135760
0.026 0.9501 150 2.5276 7383616
0.0259 0.9818 155 2.5393 7629864

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1