zephyr-7b-dpo-full-beta-0.083
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6981
- Rewards/chosen: -5.0359
- Rewards/rejected: -8.8405
- Rewards/accuracies: 0.7930
- Rewards/margins: 3.8046
- Logps/rejected: -345.7131
- Logps/chosen: -343.6803
- Logits/rejected: -2.5377
- Logits/chosen: -2.6128
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6224 | 0.05 | 100 | 0.6037 | 0.0537 | -0.1781 | 0.7129 | 0.2318 | -241.3471 | -282.3603 | -3.0281 | -3.0500 |
0.4992 | 0.1 | 200 | 0.5157 | 0.0022 | -0.7572 | 0.7520 | 0.7594 | -248.3236 | -282.9802 | -2.9768 | -3.0038 |
0.5334 | 0.15 | 300 | 0.4983 | -0.1003 | -1.0616 | 0.7695 | 0.9613 | -251.9916 | -284.2161 | -3.0036 | -3.0272 |
0.5479 | 0.21 | 400 | 0.4918 | -0.1860 | -1.2242 | 0.7656 | 1.0381 | -253.9503 | -285.2488 | -3.1264 | -3.1487 |
0.531 | 0.26 | 500 | 0.4929 | -0.4703 | -1.5754 | 0.7559 | 1.1051 | -258.1821 | -288.6731 | -3.0860 | -3.1247 |
0.486 | 0.31 | 600 | 0.5096 | -0.3559 | -1.4424 | 0.7285 | 1.0865 | -256.5800 | -287.2958 | -3.0519 | -3.0920 |
0.4858 | 0.36 | 700 | 0.5079 | -0.6230 | -1.9605 | 0.7812 | 1.3375 | -262.8217 | -290.5133 | -2.9452 | -2.9906 |
0.4844 | 0.41 | 800 | 0.4998 | -0.7197 | -2.1472 | 0.7559 | 1.4275 | -265.0713 | -291.6787 | -2.7574 | -2.7909 |
0.4999 | 0.46 | 900 | 0.4983 | -0.5951 | -1.8837 | 0.7637 | 1.2886 | -261.8963 | -290.1770 | -2.9454 | -2.9806 |
0.45 | 0.52 | 1000 | 0.4916 | -0.6703 | -2.2026 | 0.7676 | 1.5323 | -265.7383 | -291.0830 | -2.9158 | -2.9444 |
0.5239 | 0.57 | 1100 | 0.4848 | -0.8068 | -2.1600 | 0.7695 | 1.3532 | -265.2255 | -292.7281 | -2.8454 | -2.8788 |
0.4766 | 0.62 | 1200 | 0.4974 | -0.5971 | -1.8739 | 0.7441 | 1.2769 | -261.7786 | -290.2007 | -2.8189 | -2.8673 |
0.497 | 0.67 | 1300 | 0.5048 | -1.0382 | -2.4646 | 0.7266 | 1.4264 | -268.8953 | -295.5161 | -2.8081 | -2.8508 |
0.5281 | 0.72 | 1400 | 0.5003 | -1.0137 | -2.1947 | 0.7559 | 1.1810 | -265.6436 | -295.2208 | -2.7945 | -2.8255 |
0.4428 | 0.77 | 1500 | 0.4851 | -0.8809 | -2.3005 | 0.7598 | 1.4196 | -266.9182 | -293.6202 | -2.7815 | -2.8139 |
0.5192 | 0.83 | 1600 | 0.4758 | -0.9091 | -2.3825 | 0.7539 | 1.4735 | -267.9066 | -293.9598 | -2.7394 | -2.7728 |
0.533 | 0.88 | 1700 | 0.4753 | -0.8150 | -2.1835 | 0.7676 | 1.3685 | -265.5082 | -292.8266 | -2.8005 | -2.8330 |
0.5803 | 0.93 | 1800 | 0.4854 | -0.6814 | -2.0356 | 0.75 | 1.3542 | -263.7262 | -291.2166 | -2.7118 | -2.7542 |
0.4714 | 0.98 | 1900 | 0.4855 | -0.7688 | -2.1323 | 0.7559 | 1.3634 | -264.8912 | -292.2704 | -2.6864 | -2.7287 |
0.0702 | 1.03 | 2000 | 0.4988 | -1.4916 | -3.5339 | 0.7793 | 2.0423 | -281.7779 | -300.9782 | -2.6172 | -2.6670 |
0.0732 | 1.08 | 2100 | 0.5188 | -1.6274 | -3.8428 | 0.7793 | 2.2154 | -285.4998 | -302.6147 | -2.6360 | -2.6881 |
0.077 | 1.14 | 2200 | 0.5274 | -2.1510 | -4.2855 | 0.7812 | 2.1345 | -290.8334 | -308.9228 | -2.7288 | -2.7823 |
0.0673 | 1.19 | 2300 | 0.5169 | -1.7308 | -3.9343 | 0.7832 | 2.2035 | -286.6026 | -303.8600 | -2.6971 | -2.7569 |
0.1039 | 1.24 | 2400 | 0.5115 | -1.7156 | -3.7812 | 0.7715 | 2.0655 | -284.7573 | -303.6773 | -2.6974 | -2.7420 |
0.0961 | 1.29 | 2500 | 0.5290 | -2.3303 | -4.5271 | 0.7734 | 2.1968 | -293.7446 | -311.0832 | -2.7071 | -2.7485 |
0.1269 | 1.34 | 2600 | 0.5061 | -1.8237 | -3.7726 | 0.7695 | 1.9490 | -284.6546 | -304.9791 | -2.7066 | -2.7432 |
0.0959 | 1.39 | 2700 | 0.5066 | -1.8437 | -3.9127 | 0.7793 | 2.0690 | -286.3417 | -305.2205 | -2.7061 | -2.7584 |
0.1009 | 1.45 | 2800 | 0.5241 | -2.4471 | -4.6093 | 0.7852 | 2.1622 | -294.7356 | -312.4907 | -2.6836 | -2.7338 |
0.0917 | 1.5 | 2900 | 0.5350 | -2.4581 | -4.5278 | 0.75 | 2.0697 | -293.7532 | -312.6228 | -2.7069 | -2.7588 |
0.0693 | 1.55 | 3000 | 0.5371 | -2.3570 | -4.5566 | 0.7578 | 2.1996 | -294.1000 | -311.4046 | -2.7179 | -2.7642 |
0.0861 | 1.6 | 3100 | 0.5141 | -2.1264 | -4.2158 | 0.7754 | 2.0894 | -289.9940 | -308.6270 | -2.7429 | -2.8104 |
0.0851 | 1.65 | 3200 | 0.5175 | -1.9273 | -4.0951 | 0.7695 | 2.1678 | -288.5394 | -306.2276 | -2.6925 | -2.7584 |
0.0837 | 1.7 | 3300 | 0.5354 | -2.0696 | -4.3985 | 0.7637 | 2.3289 | -292.1949 | -307.9421 | -2.6726 | -2.7440 |
0.056 | 1.76 | 3400 | 0.5596 | -2.7840 | -5.3198 | 0.7734 | 2.5358 | -303.2956 | -316.5497 | -2.6498 | -2.7202 |
0.0689 | 1.81 | 3500 | 0.5348 | -2.3076 | -4.5718 | 0.7812 | 2.2642 | -294.2832 | -310.8093 | -2.7109 | -2.7732 |
0.0934 | 1.86 | 3600 | 0.5539 | -2.6736 | -5.0332 | 0.7734 | 2.3596 | -299.8421 | -315.2191 | -2.6534 | -2.7272 |
0.0694 | 1.91 | 3700 | 0.5426 | -2.6655 | -4.9512 | 0.7695 | 2.2857 | -298.8542 | -315.1215 | -2.6730 | -2.7433 |
0.1267 | 1.96 | 3800 | 0.5620 | -2.8767 | -5.0299 | 0.7910 | 2.1531 | -299.8019 | -317.6664 | -2.6778 | -2.7430 |
0.024 | 2.01 | 3900 | 0.5618 | -2.9659 | -5.5768 | 0.7832 | 2.6109 | -306.3921 | -318.7414 | -2.6526 | -2.7240 |
0.0171 | 2.07 | 4000 | 0.6117 | -3.6584 | -6.6949 | 0.7793 | 3.0364 | -319.8622 | -327.0849 | -2.6017 | -2.6789 |
0.0112 | 2.12 | 4100 | 0.6536 | -3.8851 | -7.0803 | 0.7734 | 3.1953 | -324.5066 | -329.8155 | -2.6007 | -2.6772 |
0.0123 | 2.17 | 4200 | 0.6296 | -3.5916 | -6.6239 | 0.7734 | 3.0323 | -319.0072 | -326.2793 | -2.6019 | -2.6741 |
0.0135 | 2.22 | 4300 | 0.6245 | -3.6464 | -6.7754 | 0.7832 | 3.1289 | -320.8321 | -326.9404 | -2.5877 | -2.6570 |
0.0147 | 2.27 | 4400 | 0.6659 | -4.4576 | -7.8315 | 0.7832 | 3.3739 | -333.5571 | -336.7133 | -2.5400 | -2.6114 |
0.0193 | 2.32 | 4500 | 0.6365 | -4.0338 | -7.4212 | 0.7832 | 3.3874 | -328.6134 | -331.6075 | -2.4882 | -2.5622 |
0.0141 | 2.37 | 4600 | 0.6966 | -4.9177 | -8.5470 | 0.7930 | 3.6293 | -342.1769 | -342.2570 | -2.4891 | -2.5649 |
0.0126 | 2.43 | 4700 | 0.6972 | -4.9634 | -8.5921 | 0.7949 | 3.6287 | -342.7202 | -342.8073 | -2.4465 | -2.5246 |
0.0092 | 2.48 | 4800 | 0.6804 | -4.6987 | -8.2494 | 0.7832 | 3.5507 | -338.5913 | -339.6177 | -2.4977 | -2.5738 |
0.0232 | 2.53 | 4900 | 0.6465 | -4.1657 | -7.5350 | 0.7812 | 3.3694 | -329.9847 | -333.1960 | -2.5170 | -2.5959 |
0.0121 | 2.58 | 5000 | 0.6718 | -4.7636 | -8.3913 | 0.7910 | 3.6278 | -340.3017 | -340.3996 | -2.5250 | -2.6042 |
0.0104 | 2.63 | 5100 | 0.6863 | -4.7726 | -8.4937 | 0.7930 | 3.7212 | -341.5356 | -340.5081 | -2.5020 | -2.5831 |
0.0127 | 2.68 | 5200 | 0.7056 | -5.2268 | -9.0672 | 0.7910 | 3.8404 | -348.4451 | -345.9808 | -2.5054 | -2.5842 |
0.0057 | 2.74 | 5300 | 0.6886 | -4.8479 | -8.6269 | 0.7949 | 3.7790 | -343.1393 | -341.4157 | -2.5488 | -2.6248 |
0.0132 | 2.79 | 5400 | 0.6839 | -4.7008 | -8.4009 | 0.7930 | 3.7001 | -340.4170 | -339.6432 | -2.5501 | -2.6260 |
0.0103 | 2.84 | 5500 | 0.6880 | -4.8373 | -8.5695 | 0.7969 | 3.7322 | -342.4483 | -341.2881 | -2.5405 | -2.6167 |
0.0105 | 2.89 | 5600 | 0.6968 | -5.0538 | -8.8490 | 0.7852 | 3.7952 | -345.8162 | -343.8970 | -2.5383 | -2.6136 |
0.008 | 2.94 | 5700 | 0.6993 | -5.0988 | -8.9206 | 0.7871 | 3.8218 | -346.6779 | -344.4387 | -2.5373 | -2.6125 |
0.0047 | 2.99 | 5800 | 0.6975 | -5.0353 | -8.8422 | 0.7949 | 3.8069 | -345.7339 | -343.6734 | -2.5373 | -2.6125 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tianlinliu0121/zephyr-7b-dpo-full-beta-0.083
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full