sfulay's picture
End of training
c83c126 verified
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-full-ultrabin-high-curriculum
    results: []

zephyr-7b-dpo-full-ultrabin-high-curriculum

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5058
  • Rewards/chosen: -1.0085
  • Rewards/rejected: -1.9531
  • Rewards/accuracies: 0.7617
  • Rewards/margins: 0.9446
  • Logps/rejected: -457.9681
  • Logps/chosen: -363.4786
  • Logits/rejected: 2.0914
  • Logits/chosen: 1.3032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 55
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6557 0.1046 50 0.6479 -0.0110 -0.1284 0.7070 0.1174 -275.5045 -263.7303 -2.6116 -2.6493
0.5639 0.2092 100 0.5652 -0.6096 -1.2098 0.7383 0.6002 -383.6375 -323.5883 -0.5618 -0.9012
0.5323 0.3138 150 0.5405 -0.5498 -1.2901 0.7617 0.7402 -391.6696 -317.6140 0.4792 -0.0900
0.536 0.4184 200 0.5354 -0.6382 -1.3831 0.7656 0.7449 -400.9734 -326.4470 0.6525 -0.0195
0.5163 0.5230 250 0.5185 -1.1124 -1.9604 0.7383 0.8480 -458.7008 -373.8662 2.4883 1.7620
0.5018 0.6276 300 0.5108 -0.9326 -1.8124 0.7578 0.8798 -443.9044 -355.8924 2.0905 1.3198
0.4999 0.7322 350 0.5094 -1.0356 -1.9491 0.7461 0.9135 -457.5764 -366.1917 2.0403 1.2353
0.4966 0.8368 400 0.5066 -0.9929 -1.9227 0.7578 0.9298 -454.9321 -361.9198 2.0226 1.2642
0.5198 0.9414 450 0.5057 -1.0106 -1.9562 0.7617 0.9455 -458.2784 -363.6942 2.1024 1.3120

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1