tsavage68's picture
End of training
0bd57c7 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e6rate_01beta_cSFTDPO
    results: []

IE_L3_1000steps_1e6rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -0.8216
  • Rewards/rejected: -13.7782
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 12.9566
  • Logps/rejected: -213.4093
  • Logps/chosen: -91.0134
  • Logits/rejected: -0.8670
  • Logits/chosen: -0.7142

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1913 0.4 50 0.1803 -0.5046 -8.7772 0.7400 8.2726 -163.3993 -87.8437 -0.8451 -0.7284
0.1386 0.8 100 0.1802 -1.0228 -11.9098 0.7400 10.8870 -194.7255 -93.0261 -0.8546 -0.7152
0.1386 1.2 150 0.1802 -0.6732 -12.7363 0.7400 12.0631 -202.9905 -89.5298 -0.8582 -0.7093
0.1733 1.6 200 0.1802 -0.6775 -12.8705 0.7400 12.1930 -204.3321 -89.5723 -0.8611 -0.7114
0.2253 2.0 250 0.1802 -0.7149 -13.0474 0.7400 12.3326 -206.1017 -89.9464 -0.8603 -0.7104
0.1386 2.4 300 0.1802 -0.7327 -13.0995 0.7400 12.3668 -206.6222 -90.1248 -0.8593 -0.7091
0.1213 2.8 350 0.1802 -0.7598 -13.2905 0.7400 12.5307 -208.5327 -90.3961 -0.8621 -0.7116
0.1906 3.2 400 0.1802 -0.7893 -13.4540 0.7400 12.6647 -210.1669 -90.6907 -0.8653 -0.7135
0.1906 3.6 450 0.1802 -0.7880 -13.4497 0.7400 12.6617 -210.1245 -90.6778 -0.8657 -0.7141
0.2079 4.0 500 0.1802 -0.8075 -13.6024 0.7400 12.7949 -211.6511 -90.8724 -0.8653 -0.7127
0.156 4.4 550 0.1802 -0.8042 -13.6207 0.7400 12.8165 -211.8345 -90.8401 -0.8658 -0.7138
0.1213 4.8 600 0.1802 -0.8154 -13.6478 0.7400 12.8323 -212.1049 -90.9520 -0.8661 -0.7139
0.1906 5.2 650 0.1802 -0.8263 -13.7419 0.7400 12.9156 -213.0464 -91.0612 -0.8667 -0.7144
0.2426 5.6 700 0.1802 -0.8316 -13.7569 0.7400 12.9253 -213.1964 -91.1135 -0.8668 -0.7144
0.2599 6.0 750 0.1802 -0.8155 -13.7626 0.7400 12.9471 -213.2537 -90.9532 -0.8669 -0.7141
0.1213 6.4 800 0.1802 -0.8348 -13.7975 0.7400 12.9627 -213.6019 -91.1453 -0.8666 -0.7139
0.2426 6.8 850 0.1802 -0.8359 -13.7784 0.7400 12.9425 -213.4111 -91.1564 -0.8664 -0.7143
0.1733 7.2 900 0.1802 -0.8274 -13.7943 0.7400 12.9670 -213.5706 -91.0716 -0.8673 -0.7144
0.1386 7.6 950 0.1802 -0.8173 -13.7791 0.7400 12.9618 -213.4180 -90.9708 -0.8670 -0.7140
0.156 8.0 1000 0.1802 -0.8216 -13.7782 0.7400 12.9566 -213.4093 -91.0134 -0.8670 -0.7142

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1