Edit model card

llama3_8b_instruct_dpo_bwgenerator

This model is a fine-tuned version of NanQiangHF/llama3_8b_instruct_bwgenerator on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0706
  • Rewards/chosen: -4.6241
  • Rewards/rejected: -14.8342
  • Rewards/accuracies: 0.9780
  • Rewards/margins: 10.2101
  • Logps/rejected: -216.1456
  • Logps/chosen: -84.8191
  • Logits/rejected: 0.9202
  • Logits/chosen: 0.3552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.247 0.0719 1000 0.0906 -3.7216 -11.8877 0.9686 8.1662 -186.6814 -75.7941 0.8504 0.3080
0.083 0.1438 2000 0.0775 -4.5564 -14.1375 0.9764 9.5811 -209.1791 -84.1423 0.8989 0.3418
0.0623 0.2157 3000 0.0734 -4.5379 -14.4993 0.9770 9.9614 -212.7973 -83.9572 0.9082 0.3471
0.069 0.2876 4000 0.0713 -4.5601 -14.6450 0.9777 10.0850 -214.2546 -84.1790 0.9145 0.3514
0.0752 0.3595 5000 0.0706 -4.4918 -14.6244 0.9793 10.1326 -214.0477 -83.4960 0.9181 0.3533
0.0723 0.4313 6000 0.0710 -4.6381 -14.8167 0.9780 10.1787 -215.9714 -84.9590 0.9187 0.3542
0.0852 0.5032 7000 0.0705 -4.6251 -14.8143 0.9783 10.1893 -215.9474 -84.8290 0.9189 0.3542
0.0811 0.5751 8000 0.0706 -4.6409 -14.8406 0.9780 10.1997 -216.2102 -84.9870 0.9185 0.3538
0.0762 0.6470 9000 0.0699 -4.6161 -14.8083 0.9790 10.1921 -215.8869 -84.7398 0.9186 0.3541
0.0686 0.7189 10000 0.0703 -4.6164 -14.8042 0.9790 10.1878 -215.8462 -84.7421 0.9185 0.3537
0.061 0.7908 11000 0.0705 -4.6191 -14.8169 0.9793 10.1977 -215.9726 -84.7695 0.9207 0.3556
0.0786 0.8627 12000 0.0698 -4.6080 -14.7978 0.9793 10.1898 -215.7822 -84.6584 0.9195 0.3546
0.073 0.9346 13000 0.0706 -4.6241 -14.8342 0.9780 10.2101 -216.1456 -84.8191 0.9202 0.3552

Framework versions

  • PEFT 0.10.0
  • Transformers 4.44.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.14.7
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for NanQiangHF/llama3_8b_instruct_dpo_bwgenerator

Adapter
(2)
this model