File size: 11,372 Bytes
88901c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
model-index:
- name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.7-DPO

This model is a fine-tuned version of [Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged](https://huggingface.co/Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0923
- Rewards/chosen: 1.3984
- Rewards/rejected: -6.4179
- Rewards/accuracies: 0.9643
- Rewards/margins: 7.8163
- Logps/rejected: -264.5786
- Logps/chosen: -189.8816
- Logits/rejected: -1.8496
- Logits/chosen: -1.8101

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.03
- training_steps: 1470
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6781        | 0.12  | 30   | 0.6762          | 0.0504         | 0.0157           | 0.75               | 0.0347          | -243.1332      | -194.3750    | -1.8308         | -1.7951       |
| 0.5918        | 0.24  | 60   | 0.5998          | 0.2476         | 0.0383           | 0.7857             | 0.2093          | -243.0578      | -193.7174    | -1.8333         | -1.7975       |
| 0.4932        | 0.37  | 90   | 0.5072          | 0.5622         | 0.0680           | 0.8214             | 0.4942          | -242.9590      | -192.6691    | -1.8364         | -1.8004       |
| 0.4391        | 0.49  | 120  | 0.4336          | 0.9734         | 0.1121           | 0.7857             | 0.8613          | -242.8120      | -191.2982    | -1.8413         | -1.8051       |
| 0.3208        | 0.61  | 150  | 0.3933          | 1.3961         | 0.0824           | 0.7857             | 1.3137          | -242.9110      | -189.8893    | -1.8492         | -1.8130       |
| 0.3215        | 0.73  | 180  | 0.3756          | 1.8483         | 0.0151           | 0.7857             | 1.8332          | -243.1354      | -188.3820    | -1.8562         | -1.8194       |
| 0.0817        | 0.86  | 210  | 0.3835          | 2.3139         | -0.1849          | 0.7857             | 2.4989          | -243.8021      | -186.8299    | -1.8641         | -1.8266       |
| 0.137         | 0.98  | 240  | 0.4132          | 2.5979         | -0.5021          | 0.75               | 3.1001          | -244.8594      | -185.8831    | -1.8722         | -1.8343       |
| 0.0997        | 1.1   | 270  | 0.4657          | 2.7384         | -1.0053          | 0.75               | 3.7438          | -246.5367      | -185.4148    | -1.8816         | -1.8430       |
| 0.0432        | 1.22  | 300  | 0.5011          | 2.7041         | -1.4771          | 0.75               | 4.1812          | -248.1093      | -185.5293    | -1.8884         | -1.8495       |
| 0.1819        | 1.35  | 330  | 0.4785          | 2.7004         | -1.8249          | 0.75               | 4.5253          | -249.2688      | -185.5418    | -1.8878         | -1.8487       |
| 0.0169        | 1.47  | 360  | 0.4872          | 2.6643         | -2.1577          | 0.75               | 4.8220          | -250.3781      | -185.6619    | -1.8907         | -1.8510       |
| 0.235         | 1.59  | 390  | 0.4886          | 2.6565         | -2.3834          | 0.75               | 5.0399          | -251.1302      | -185.6880    | -1.8930         | -1.8532       |
| 0.7551        | 1.71  | 420  | 0.4380          | 2.7229         | -2.3468          | 0.75               | 5.0697          | -251.0082      | -185.4665    | -1.8921         | -1.8527       |
| 0.134         | 1.84  | 450  | 0.4383          | 2.6666         | -2.5566          | 0.75               | 5.2232          | -251.7077      | -185.6543    | -1.8925         | -1.8531       |
| 0.0662        | 1.96  | 480  | 0.4448          | 2.5586         | -2.9192          | 0.75               | 5.4778          | -252.9164      | -186.0143    | -1.8964         | -1.8569       |
| 0.1093        | 2.08  | 510  | 0.4262          | 2.5211         | -3.0726          | 0.75               | 5.5937          | -253.4277      | -186.1394    | -1.8955         | -1.8561       |
| 0.1557        | 2.2   | 540  | 0.4264          | 2.3694         | -3.4198          | 0.75               | 5.7892          | -254.5848      | -186.6449    | -1.8965         | -1.8566       |
| 0.0962        | 2.33  | 570  | 0.4182          | 2.2640         | -3.7076          | 0.75               | 5.9716          | -255.5444      | -186.9964    | -1.8978         | -1.8582       |
| 0.0437        | 2.45  | 600  | 0.3824          | 2.2618         | -3.7757          | 0.75               | 6.0375          | -255.7713      | -187.0037    | -1.8933         | -1.8534       |
| 0.0278        | 2.57  | 630  | 0.3571          | 2.3503         | -3.7557          | 0.8571             | 6.1060          | -255.7046      | -186.7086    | -1.8932         | -1.8536       |
| 0.2399        | 2.69  | 660  | 0.3313          | 2.3025         | -3.9256          | 0.8571             | 6.2281          | -256.2710      | -186.8678    | -1.8909         | -1.8512       |
| 0.039         | 2.82  | 690  | 0.3131          | 2.2138         | -4.1650          | 0.8929             | 6.3789          | -257.0691      | -187.1635    | -1.8906         | -1.8510       |
| 0.3389        | 2.94  | 720  | 0.2763          | 2.2605         | -4.2160          | 0.8929             | 6.4765          | -257.2390      | -187.0079    | -1.8873         | -1.8480       |
| 0.0154        | 3.06  | 750  | 0.2704          | 2.2526         | -4.3017          | 0.8929             | 6.5544          | -257.5247      | -187.0342    | -1.8862         | -1.8470       |
| 0.021         | 3.18  | 780  | 0.2422          | 2.2548         | -4.3438          | 0.8929             | 6.5986          | -257.6650      | -187.0270    | -1.8838         | -1.8448       |
| 0.0614        | 3.31  | 810  | 0.2144          | 2.2331         | -4.4495          | 0.8929             | 6.6826          | -258.0172      | -187.0992    | -1.8805         | -1.8417       |
| 0.0529        | 3.43  | 840  | 0.2121          | 2.1562         | -4.6740          | 0.8929             | 6.8302          | -258.7657      | -187.3555    | -1.8809         | -1.8423       |
| 0.001         | 3.55  | 870  | 0.2092          | 2.1034         | -4.8454          | 0.8929             | 6.9487          | -259.3368      | -187.5317    | -1.8799         | -1.8410       |
| 0.0284        | 3.67  | 900  | 0.2006          | 1.9814         | -5.1388          | 0.8929             | 7.1202          | -260.3150      | -187.9384    | -1.8760         | -1.8366       |
| 0.0744        | 3.8   | 930  | 0.1813          | 1.9437         | -5.2351          | 0.8929             | 7.1788          | -260.6358      | -188.0639    | -1.8733         | -1.8339       |
| 0.091         | 3.92  | 960  | 0.1722          | 1.8333         | -5.4335          | 0.8929             | 7.2668          | -261.2973      | -188.4319    | -1.8707         | -1.8313       |
| 0.3504        | 4.04  | 990  | 0.1487          | 1.8678         | -5.3589          | 0.9286             | 7.2268          | -261.0488      | -188.3168    | -1.8672         | -1.8279       |
| 0.0071        | 4.16  | 1020 | 0.1403          | 1.7989         | -5.5185          | 0.9286             | 7.3173          | -261.5805      | -188.5468    | -1.8637         | -1.8243       |
| 0.0131        | 4.29  | 1050 | 0.1312          | 1.8050         | -5.5495          | 0.9286             | 7.3545          | -261.6841      | -188.5262    | -1.8616         | -1.8222       |
| 0.0868        | 4.41  | 1080 | 0.1210          | 1.7626         | -5.6284          | 0.9286             | 7.3911          | -261.9471      | -188.6675    | -1.8587         | -1.8195       |
| 0.0041        | 4.53  | 1110 | 0.1206          | 1.6865         | -5.7780          | 0.9286             | 7.4645          | -262.4456      | -188.9213    | -1.8566         | -1.8173       |
| 0.0107        | 4.65  | 1140 | 0.1178          | 1.6370         | -5.8895          | 0.9643             | 7.5266          | -262.8174      | -189.0862    | -1.8563         | -1.8171       |
| 0.0084        | 4.78  | 1170 | 0.1123          | 1.6107         | -5.9365          | 0.9643             | 7.5471          | -262.9738      | -189.1741    | -1.8552         | -1.8159       |
| 0.0049        | 4.9   | 1200 | 0.1083          | 1.5710         | -6.0495          | 0.9643             | 7.6206          | -263.3507      | -189.3061    | -1.8545         | -1.8151       |
| 0.0746        | 5.02  | 1230 | 0.1034          | 1.5328         | -6.1286          | 0.9643             | 7.6614          | -263.6144      | -189.4336    | -1.8535         | -1.8140       |
| 0.0091        | 5.14  | 1260 | 0.1031          | 1.4764         | -6.2562          | 0.9643             | 7.7327          | -264.0397      | -189.6215    | -1.8531         | -1.8136       |
| 0.0526        | 5.27  | 1290 | 0.0997          | 1.4526         | -6.3037          | 0.9643             | 7.7564          | -264.1981      | -189.7009    | -1.8528         | -1.8133       |
| 0.0316        | 5.39  | 1320 | 0.0965          | 1.4471         | -6.3114          | 0.9643             | 7.7585          | -264.2236      | -189.7192    | -1.8517         | -1.8124       |
| 0.0249        | 5.51  | 1350 | 0.0950          | 1.4370         | -6.3384          | 0.9643             | 7.7755          | -264.3138      | -189.7529    | -1.8509         | -1.8115       |
| 0.2078        | 5.63  | 1380 | 0.0937          | 1.4141         | -6.3790          | 0.9643             | 7.7931          | -264.4489      | -189.8293    | -1.8504         | -1.8111       |
| 0.013         | 5.76  | 1410 | 0.0926          | 1.4237         | -6.3666          | 0.9643             | 7.7902          | -264.4076      | -189.7974    | -1.8498         | -1.8103       |
| 0.0194        | 5.88  | 1440 | 0.0923          | 1.3984         | -6.4179          | 0.9643             | 7.8163          | -264.5786      | -189.8816    | -1.8496         | -1.8101       |
| 0.0111        | 6.0   | 1470 | 0.0919          | 1.3959         | -6.4219          | 0.9643             | 7.8179          | -264.5919      | -189.8898    | -1.8495         | -1.8100       |


### Framework versions

- PEFT 0.10.0
- Transformers 4.38.2
- Pytorch 2.1.0+cu118
- Datasets 2.18.0
- Tokenizers 0.15.2