CharlesLi's picture
Model save
a81887f verified
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-most-similar
    results: []

OpenELM-1_1B-DPO-full-most-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2575
  • Rewards/chosen: -6.8438
  • Rewards/rejected: -7.25
  • Rewards/accuracies: 0.5215
  • Rewards/margins: 0.3887
  • Logps/rejected: -1012.0
  • Logps/chosen: -1004.0
  • Logits/rejected: -5.0
  • Logits/chosen: -6.125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6224 0.1047 100 0.6811 -0.4375 -0.4980 0.5566 0.0618 -338.0 -362.0 -12.25 -12.5
0.624 0.2093 200 0.6957 -0.8281 -0.8867 0.5176 0.0569 -378.0 -402.0 -11.0 -11.4375
0.6062 0.3140 300 0.7007 -0.5938 -0.6289 0.5078 0.0337 -352.0 -378.0 -12.25 -12.4375
0.6334 0.4186 400 0.7011 -1.2656 -1.3438 0.5176 0.0815 -424.0 -444.0 -11.375 -11.75
0.6236 0.5233 500 0.7273 -1.1172 -1.1875 0.5527 0.0659 -408.0 -430.0 -11.4375 -11.8125
0.648 0.6279 600 0.6997 -1.3438 -1.3984 0.5059 0.0508 -428.0 -452.0 -13.75 -13.625
0.6131 0.7326 700 0.7108 -1.4922 -1.5312 0.5293 0.0396 -442.0 -468.0 -12.8125 -12.625
0.621 0.8373 800 0.7204 -1.3516 -1.4141 0.5371 0.0581 -430.0 -454.0 -14.0625 -14.0625
0.6114 0.9419 900 0.7060 -1.6797 -1.8125 0.5371 0.1328 -470.0 -486.0 -13.875 -13.8125
0.1659 1.0466 1000 0.8400 -2.9688 -3.2188 0.5645 0.2520 -608.0 -616.0 -7.5 -8.5625
0.1767 1.1512 1100 0.9194 -3.0781 -3.2188 0.5156 0.1406 -612.0 -624.0 -12.125 -13.0
0.1574 1.2559 1200 0.9110 -3.8125 -4.0938 0.5332 0.2715 -696.0 -700.0 -11.9375 -12.75
0.1637 1.3605 1300 0.8868 -3.5312 -3.7656 0.5410 0.2314 -664.0 -672.0 -11.25 -12.0
0.1275 1.4652 1400 0.9276 -3.7031 -3.9844 0.5488 0.2754 -688.0 -688.0 -9.0625 -10.1875
0.1468 1.5699 1500 0.9168 -3.9688 -4.1562 0.5352 0.1943 -704.0 -716.0 -10.6875 -11.3125
0.1427 1.6745 1600 0.9187 -4.3125 -4.5625 0.5234 0.2656 -744.0 -748.0 -10.125 -11.0
0.1592 1.7792 1700 0.8701 -4.6875 -5.0312 0.5586 0.3516 -792.0 -784.0 -10.4375 -11.25
0.1341 1.8838 1800 0.9226 -3.9531 -4.2188 0.5391 0.2598 -708.0 -712.0 -9.625 -10.5625
0.1366 1.9885 1900 0.9103 -4.1562 -4.4375 0.5234 0.2754 -732.0 -736.0 -9.9375 -10.75
0.026 2.0931 2000 1.0973 -5.7812 -6.125 0.5254 0.3379 -900.0 -896.0 -6.5 -7.5312
0.0178 2.1978 2100 1.1703 -6.0312 -6.4375 0.5293 0.3867 -932.0 -924.0 -6.2188 -7.2812
0.019 2.3025 2200 1.1800 -6.4062 -6.8125 0.5312 0.4004 -968.0 -960.0 -5.9062 -6.9688
0.0173 2.4071 2300 1.1893 -6.3438 -6.75 0.5293 0.3965 -964.0 -952.0 -5.7188 -6.7812
0.0147 2.5118 2400 1.2635 -6.7188 -7.125 0.5176 0.3926 -1000.0 -992.0 -5.375 -6.4688
0.016 2.6164 2500 1.2629 -6.75 -7.125 0.5195 0.375 -1000.0 -992.0 -5.3125 -6.4062
0.0171 2.7211 2600 1.2716 -6.8438 -7.2188 0.5176 0.3809 -1012.0 -1004.0 -5.125 -6.2188
0.0123 2.8257 2700 1.2615 -6.875 -7.25 0.5195 0.3867 -1016.0 -1008.0 -5.0 -6.0938
0.0198 2.9304 2800 1.2575 -6.8438 -7.25 0.5215 0.3887 -1012.0 -1004.0 -5.0 -6.125

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0