CharlesLi's picture
Model save
3cad9f4 verified
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-reward-most-similar
    results: []

OpenELM-1_1B-DPO-full-max-reward-most-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6465
  • Rewards/chosen: -17.75
  • Rewards/rejected: -19.75
  • Rewards/accuracies: 0.6055
  • Rewards/margins: 2.0469
  • Logps/rejected: -2272.0
  • Logps/chosen: -2096.0
  • Logits/rejected: 2.0312
  • Logits/chosen: 0.2393

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5786 0.1047 100 0.6689 -1.8203 -2.0625 0.6094 0.2373 -494.0 -500.0 -9.25 -9.75
0.5359 0.2094 200 0.7366 -3.3281 -3.8125 0.5898 0.4824 -672.0 -652.0 -1.7812 -2.8906
0.5163 0.3141 300 0.6974 -4.25 -4.8438 0.6426 0.6016 -776.0 -744.0 -5.4688 -6.75
0.5127 0.4188 400 0.7937 -5.375 -6.0625 0.6016 0.6797 -896.0 -856.0 -7.9375 -9.125
0.5047 0.5236 500 0.7909 -4.5938 -5.2188 0.5703 0.6523 -812.0 -776.0 -3.7188 -5.5938
0.5057 0.6283 600 0.8288 -5.375 -6.125 0.5918 0.7539 -904.0 -856.0 -4.5 -6.4062
0.48 0.7330 700 0.7987 -5.5312 -6.4062 0.6289 0.8633 -928.0 -872.0 -3.8438 -5.6562
0.4751 0.8377 800 0.8430 -7.0625 -7.7812 0.5586 0.7070 -1064.0 -1024.0 -4.3125 -6.125
0.4408 0.9424 900 0.8971 -8.3125 -9.1875 0.5996 0.9023 -1208.0 -1152.0 -6.3438 -8.1875
0.1609 1.0471 1000 0.9796 -8.1875 -9.1875 0.5996 1.0156 -1208.0 -1136.0 -1.7734 -3.7656
0.1551 1.1518 1100 1.2334 -13.8125 -15.0625 0.5938 1.2422 -1792.0 -1704.0 -0.2617 -2.0312
0.1584 1.2565 1200 1.0642 -10.375 -11.5625 0.5918 1.1641 -1440.0 -1360.0 -2.1875 -3.9844
0.1618 1.3613 1300 0.9750 -9.1875 -10.3125 0.6211 1.1484 -1320.0 -1240.0 -1.25 -3.0781
0.1667 1.4660 1400 1.0401 -9.75 -11.125 0.6191 1.3125 -1400.0 -1296.0 -1.1094 -3.1875
0.1714 1.5707 1500 1.0380 -10.6875 -12.0625 0.6230 1.3438 -1496.0 -1392.0 -0.2578 -2.1719
0.1406 1.6754 1600 1.0427 -11.25 -12.625 0.6211 1.375 -1552.0 -1440.0 -0.0874 -2.0469
0.1195 1.7801 1700 1.1374 -12.25 -13.625 0.6133 1.3906 -1648.0 -1544.0 -0.4316 -2.1875
0.1291 1.8848 1800 1.0742 -11.6875 -13.0625 0.5938 1.3438 -1592.0 -1488.0 0.0305 -1.7344
0.1236 1.9895 1900 1.1539 -13.0 -14.375 0.5840 1.3984 -1728.0 -1616.0 0.7383 -0.9727
0.0264 2.0942 2000 1.5533 -16.5 -18.25 0.5840 1.75 -2112.0 -1968.0 1.1562 -0.625
0.0222 2.1990 2100 1.6053 -17.375 -19.25 0.5957 1.8906 -2224.0 -2064.0 2.0781 0.3105
0.0266 2.3037 2200 1.5843 -17.125 -19.0 0.6055 1.8672 -2192.0 -2032.0 1.9297 0.0918
0.0247 2.4084 2300 1.6309 -17.875 -19.875 0.6094 2.0 -2288.0 -2112.0 2.1719 0.3652
0.0381 2.5131 2400 1.6237 -17.75 -19.625 0.6055 1.9219 -2256.0 -2096.0 2.0 0.2354
0.0307 2.6178 2500 1.6102 -17.375 -19.375 0.6055 2.0156 -2224.0 -2064.0 1.9141 0.1069
0.0259 2.7225 2600 1.6399 -17.75 -19.75 0.6035 2.0469 -2272.0 -2096.0 2.0469 0.2773
0.0279 2.8272 2700 1.6252 -17.5 -19.5 0.6074 2.0312 -2240.0 -2064.0 1.9609 0.1533
0.0219 2.9319 2800 1.6465 -17.75 -19.75 0.6055 2.0469 -2272.0 -2096.0 2.0312 0.2393

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0