--- library_name: peft tags: - trl - dpo - generated_from_trainer base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged model-index: - name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO results: [] --- # WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO This model is a fine-tuned version of [Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged](https://huggingface.co/Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0923 - Rewards/chosen: 1.3984 - Rewards/rejected: -6.4179 - Rewards/accuracies: 0.9643 - Rewards/margins: 7.8163 - Logps/rejected: -264.5786 - Logps/chosen: -189.8816 - Logits/rejected: -1.8496 - Logits/chosen: -1.8101 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.03 - training_steps: 1470 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6781 | 0.12 | 30 | 0.6762 | 0.0504 | 0.0157 | 0.75 | 0.0347 | -243.1332 | -194.3750 | -1.8308 | -1.7951 | | 0.5918 | 0.24 | 60 | 0.5998 | 0.2476 | 0.0383 | 0.7857 | 0.2093 | -243.0578 | -193.7174 | -1.8333 | -1.7975 | | 0.4932 | 0.37 | 90 | 0.5072 | 0.5622 | 0.0680 | 0.8214 | 0.4942 | -242.9590 | -192.6691 | -1.8364 | -1.8004 | | 0.4391 | 0.49 | 120 | 0.4336 | 0.9734 | 0.1121 | 0.7857 | 0.8613 | -242.8120 | -191.2982 | -1.8413 | -1.8051 | | 0.3208 | 0.61 | 150 | 0.3933 | 1.3961 | 0.0824 | 0.7857 | 1.3137 | -242.9110 | -189.8893 | -1.8492 | -1.8130 | | 0.3215 | 0.73 | 180 | 0.3756 | 1.8483 | 0.0151 | 0.7857 | 1.8332 | -243.1354 | -188.3820 | -1.8562 | -1.8194 | | 0.0817 | 0.86 | 210 | 0.3835 | 2.3139 | -0.1849 | 0.7857 | 2.4989 | -243.8021 | -186.8299 | -1.8641 | -1.8266 | | 0.137 | 0.98 | 240 | 0.4132 | 2.5979 | -0.5021 | 0.75 | 3.1001 | -244.8594 | -185.8831 | -1.8722 | -1.8343 | | 0.0997 | 1.1 | 270 | 0.4657 | 2.7384 | -1.0053 | 0.75 | 3.7438 | -246.5367 | -185.4148 | -1.8816 | -1.8430 | | 0.0432 | 1.22 | 300 | 0.5011 | 2.7041 | -1.4771 | 0.75 | 4.1812 | -248.1093 | -185.5293 | -1.8884 | -1.8495 | | 0.1819 | 1.35 | 330 | 0.4785 | 2.7004 | -1.8249 | 0.75 | 4.5253 | -249.2688 | -185.5418 | -1.8878 | -1.8487 | | 0.0169 | 1.47 | 360 | 0.4872 | 2.6643 | -2.1577 | 0.75 | 4.8220 | -250.3781 | -185.6619 | -1.8907 | -1.8510 | | 0.235 | 1.59 | 390 | 0.4886 | 2.6565 | -2.3834 | 0.75 | 5.0399 | -251.1302 | -185.6880 | -1.8930 | -1.8532 | | 0.7551 | 1.71 | 420 | 0.4380 | 2.7229 | -2.3468 | 0.75 | 5.0697 | -251.0082 | -185.4665 | -1.8921 | -1.8527 | | 0.134 | 1.84 | 450 | 0.4383 | 2.6666 | -2.5566 | 0.75 | 5.2232 | -251.7077 | -185.6543 | -1.8925 | -1.8531 | | 0.0662 | 1.96 | 480 | 0.4448 | 2.5586 | -2.9192 | 0.75 | 5.4778 | -252.9164 | -186.0143 | -1.8964 | -1.8569 | | 0.1093 | 2.08 | 510 | 0.4262 | 2.5211 | -3.0726 | 0.75 | 5.5937 | -253.4277 | -186.1394 | -1.8955 | -1.8561 | | 0.1557 | 2.2 | 540 | 0.4264 | 2.3694 | -3.4198 | 0.75 | 5.7892 | -254.5848 | -186.6449 | -1.8965 | -1.8566 | | 0.0962 | 2.33 | 570 | 0.4182 | 2.2640 | -3.7076 | 0.75 | 5.9716 | -255.5444 | -186.9964 | -1.8978 | -1.8582 | | 0.0437 | 2.45 | 600 | 0.3824 | 2.2618 | -3.7757 | 0.75 | 6.0375 | -255.7713 | -187.0037 | -1.8933 | -1.8534 | | 0.0278 | 2.57 | 630 | 0.3571 | 2.3503 | -3.7557 | 0.8571 | 6.1060 | -255.7046 | -186.7086 | -1.8932 | -1.8536 | | 0.2399 | 2.69 | 660 | 0.3313 | 2.3025 | -3.9256 | 0.8571 | 6.2281 | -256.2710 | -186.8678 | -1.8909 | -1.8512 | | 0.039 | 2.82 | 690 | 0.3131 | 2.2138 | -4.1650 | 0.8929 | 6.3789 | -257.0691 | -187.1635 | -1.8906 | -1.8510 | | 0.3389 | 2.94 | 720 | 0.2763 | 2.2605 | -4.2160 | 0.8929 | 6.4765 | -257.2390 | -187.0079 | -1.8873 | -1.8480 | | 0.0154 | 3.06 | 750 | 0.2704 | 2.2526 | -4.3017 | 0.8929 | 6.5544 | -257.5247 | -187.0342 | -1.8862 | -1.8470 | | 0.021 | 3.18 | 780 | 0.2422 | 2.2548 | -4.3438 | 0.8929 | 6.5986 | -257.6650 | -187.0270 | -1.8838 | -1.8448 | | 0.0614 | 3.31 | 810 | 0.2144 | 2.2331 | -4.4495 | 0.8929 | 6.6826 | -258.0172 | -187.0992 | -1.8805 | -1.8417 | | 0.0529 | 3.43 | 840 | 0.2121 | 2.1562 | -4.6740 | 0.8929 | 6.8302 | -258.7657 | -187.3555 | -1.8809 | -1.8423 | | 0.001 | 3.55 | 870 | 0.2092 | 2.1034 | -4.8454 | 0.8929 | 6.9487 | -259.3368 | -187.5317 | -1.8799 | -1.8410 | | 0.0284 | 3.67 | 900 | 0.2006 | 1.9814 | -5.1388 | 0.8929 | 7.1202 | -260.3150 | -187.9384 | -1.8760 | -1.8366 | | 0.0744 | 3.8 | 930 | 0.1813 | 1.9437 | -5.2351 | 0.8929 | 7.1788 | -260.6358 | -188.0639 | -1.8733 | -1.8339 | | 0.091 | 3.92 | 960 | 0.1722 | 1.8333 | -5.4335 | 0.8929 | 7.2668 | -261.2973 | -188.4319 | -1.8707 | -1.8313 | | 0.3504 | 4.04 | 990 | 0.1487 | 1.8678 | -5.3589 | 0.9286 | 7.2268 | -261.0488 | -188.3168 | -1.8672 | -1.8279 | | 0.0071 | 4.16 | 1020 | 0.1403 | 1.7989 | -5.5185 | 0.9286 | 7.3173 | -261.5805 | -188.5468 | -1.8637 | -1.8243 | | 0.0131 | 4.29 | 1050 | 0.1312 | 1.8050 | -5.5495 | 0.9286 | 7.3545 | -261.6841 | -188.5262 | -1.8616 | -1.8222 | | 0.0868 | 4.41 | 1080 | 0.1210 | 1.7626 | -5.6284 | 0.9286 | 7.3911 | -261.9471 | -188.6675 | -1.8587 | -1.8195 | | 0.0041 | 4.53 | 1110 | 0.1206 | 1.6865 | -5.7780 | 0.9286 | 7.4645 | -262.4456 | -188.9213 | -1.8566 | -1.8173 | | 0.0107 | 4.65 | 1140 | 0.1178 | 1.6370 | -5.8895 | 0.9643 | 7.5266 | -262.8174 | -189.0862 | -1.8563 | -1.8171 | | 0.0084 | 4.78 | 1170 | 0.1123 | 1.6107 | -5.9365 | 0.9643 | 7.5471 | -262.9738 | -189.1741 | -1.8552 | -1.8159 | | 0.0049 | 4.9 | 1200 | 0.1083 | 1.5710 | -6.0495 | 0.9643 | 7.6206 | -263.3507 | -189.3061 | -1.8545 | -1.8151 | | 0.0746 | 5.02 | 1230 | 0.1034 | 1.5328 | -6.1286 | 0.9643 | 7.6614 | -263.6144 | -189.4336 | -1.8535 | -1.8140 | | 0.0091 | 5.14 | 1260 | 0.1031 | 1.4764 | -6.2562 | 0.9643 | 7.7327 | -264.0397 | -189.6215 | -1.8531 | -1.8136 | | 0.0526 | 5.27 | 1290 | 0.0997 | 1.4526 | -6.3037 | 0.9643 | 7.7564 | -264.1981 | -189.7009 | -1.8528 | -1.8133 | | 0.0316 | 5.39 | 1320 | 0.0965 | 1.4471 | -6.3114 | 0.9643 | 7.7585 | -264.2236 | -189.7192 | -1.8517 | -1.8124 | | 0.0249 | 5.51 | 1350 | 0.0950 | 1.4370 | -6.3384 | 0.9643 | 7.7755 | -264.3138 | -189.7529 | -1.8509 | -1.8115 | | 0.2078 | 5.63 | 1380 | 0.0937 | 1.4141 | -6.3790 | 0.9643 | 7.7931 | -264.4489 | -189.8293 | -1.8504 | -1.8111 | | 0.013 | 5.76 | 1410 | 0.0926 | 1.4237 | -6.3666 | 0.9643 | 7.7902 | -264.4076 | -189.7974 | -1.8498 | -1.8103 | | 0.0194 | 5.88 | 1440 | 0.0923 | 1.3984 | -6.4179 | 0.9643 | 7.8163 | -264.5786 | -189.8816 | -1.8496 | -1.8101 | | 0.0111 | 6.0 | 1470 | 0.0919 | 1.3959 | -6.4219 | 0.9643 | 7.8179 | -264.5919 | -189.8898 | -1.8495 | -1.8100 | ### Framework versions - PEFT 0.10.0 - Transformers 4.38.2 - Pytorch 2.1.0+cu118 - Datasets 2.18.0 - Tokenizers 0.15.2