Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5443
  • Rewards/chosen: -0.4815
  • Rewards/rejected: -0.9488
  • Rewards/accuracies: 0.7228
  • Rewards/margins: 0.4673
  • Logps/rejected: -353.7514
  • Logps/chosen: -321.4202
  • Logits/rejected: -1.9090
  • Logits/chosen: -2.0033

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6906 0.03 100 0.6904 0.0079 0.0034 0.5998 0.0045 -258.5292 -272.4813 -2.5792 -2.6794
0.6832 0.05 200 0.6789 0.0377 0.0100 0.6310 0.0277 -257.8704 -269.5003 -2.5936 -2.6947
0.6593 0.08 300 0.6569 0.0387 -0.0302 0.6613 0.0689 -261.8914 -269.4003 -2.5863 -2.6882
0.6187 0.11 400 0.6299 -0.0829 -0.2296 0.6633 0.1467 -281.8324 -281.5615 -2.5720 -2.6736
0.6725 0.13 500 0.6312 0.0386 -0.1012 0.6724 0.1399 -268.9954 -269.4083 -2.5082 -2.6086
0.6094 0.16 600 0.6134 -0.0997 -0.2926 0.6724 0.1929 -288.1276 -283.2419 -2.4269 -2.5264
0.622 0.19 700 0.6138 -0.0662 -0.2635 0.6764 0.1974 -285.2251 -279.8869 -2.4243 -2.5237
0.6219 0.21 800 0.6040 -0.0706 -0.3182 0.6905 0.2476 -290.6908 -280.3326 -2.3385 -2.4372
0.623 0.24 900 0.5902 -0.3711 -0.7045 0.6855 0.3334 -329.3168 -310.3764 -2.3155 -2.4141
0.6097 0.27 1000 0.5900 -0.4292 -0.7963 0.6734 0.3670 -338.4981 -316.1939 -2.1784 -2.2752
0.6136 0.29 1100 0.5784 -0.1724 -0.5165 0.7046 0.3441 -310.5219 -290.5160 -2.1727 -2.2710
0.6567 0.32 1200 0.5606 -0.3899 -0.8591 0.7188 0.4691 -344.7808 -312.2660 -2.1098 -2.2080
0.643 0.35 1300 0.5635 -0.4687 -0.8505 0.7107 0.3818 -343.9194 -320.1420 -2.1156 -2.2131
0.5965 0.37 1400 0.5605 -0.5007 -0.9440 0.6925 0.4433 -353.2680 -323.3370 -2.0815 -2.1787
0.5845 0.4 1500 0.5532 -0.4826 -0.9062 0.7137 0.4236 -349.4915 -321.5331 -2.1051 -2.2018
0.5877 0.43 1600 0.5531 -0.4294 -0.8415 0.7248 0.4122 -343.0254 -316.2077 -2.0961 -2.1928
0.5718 0.45 1700 0.5541 -0.4126 -0.8317 0.7167 0.4192 -342.0433 -314.5273 -2.0529 -2.1491
0.5572 0.48 1800 0.5574 -0.4879 -0.8998 0.7177 0.4119 -348.8538 -322.0596 -2.0373 -2.1342
0.6168 0.51 1900 0.5588 -0.4436 -0.8496 0.7046 0.4060 -343.8322 -317.6336 -2.0368 -2.1330
0.5584 0.53 2000 0.5511 -0.4453 -0.8705 0.7188 0.4252 -345.9210 -317.8035 -2.0010 -2.0971
0.5863 0.56 2100 0.5504 -0.4714 -0.9080 0.7218 0.4365 -349.6664 -320.4117 -1.9625 -2.0577
0.5805 0.59 2200 0.5509 -0.4003 -0.8341 0.7218 0.4338 -342.2770 -313.2971 -1.9579 -2.0528
0.5853 0.61 2300 0.5429 -0.4611 -0.9355 0.7188 0.4743 -352.4162 -319.3822 -1.9440 -2.0390
0.5561 0.64 2400 0.5407 -0.5040 -0.9820 0.7208 0.4780 -357.0744 -323.6726 -1.9359 -2.0311
0.559 0.67 2500 0.5508 -0.4111 -0.8553 0.7208 0.4442 -344.4033 -314.3806 -1.9377 -2.0324
0.5803 0.69 2600 0.5507 -0.4118 -0.8515 0.7218 0.4397 -344.0258 -314.4533 -1.9263 -2.0208
0.5537 0.72 2700 0.5453 -0.4866 -0.9460 0.7188 0.4594 -353.4755 -321.9308 -1.9121 -2.0064
0.5562 0.75 2800 0.5441 -0.5000 -0.9628 0.7228 0.4628 -355.1509 -323.2741 -1.9088 -2.0030
0.5553 0.77 2900 0.5442 -0.4959 -0.9606 0.7188 0.4647 -354.9320 -322.8617 -1.9075 -2.0017
0.59 0.8 3000 0.5431 -0.5074 -0.9824 0.7188 0.4749 -357.1091 -324.0160 -1.9088 -2.0032
0.6168 0.83 3100 0.5439 -0.4857 -0.9579 0.7188 0.4722 -354.6618 -321.8407 -1.9073 -2.0016
0.5655 0.85 3200 0.5443 -0.4817 -0.9508 0.7198 0.4691 -353.9527 -321.4442 -1.9095 -2.0037
0.5201 0.88 3300 0.5441 -0.4835 -0.9518 0.7228 0.4683 -354.0496 -321.6215 -1.9082 -2.0024
0.5576 0.91 3400 0.5443 -0.4814 -0.9485 0.7248 0.4672 -353.7254 -321.4092 -1.9113 -2.0056
0.5929 0.93 3500 0.5443 -0.4814 -0.9481 0.7248 0.4667 -353.6854 -321.4121 -1.9086 -2.0029
0.507 0.96 3600 0.5443 -0.4815 -0.9488 0.7198 0.4673 -353.7491 -321.4216 -1.9086 -2.0028
0.5412 0.99 3700 0.5443 -0.4815 -0.9488 0.7228 0.4673 -353.7514 -321.4202 -1.9090 -2.0033

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LeeSB/zephyr-7b-dpo-qlora

Adapter
this model