Commit
•
95a69a1
1
Parent(s):
0b11e33
Update README.md
Browse files
README.md
CHANGED
@@ -27,15 +27,15 @@ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/m
|
|
27 |
The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
|
28 |
|
29 |
It achieves the following results on the evaluation set:
|
30 |
-
- Loss: 0.
|
31 |
-
- Rewards/chosen: 0.
|
32 |
-
- Rewards/rejected: -
|
33 |
-
- Rewards/accuracies: 0.
|
34 |
-
- Rewards/margins:
|
35 |
-
- Logps/rejected: -
|
36 |
-
- Logps/chosen: -
|
37 |
-
- Logits/rejected: 0.
|
38 |
-
- Logits/chosen: 0.
|
39 |
|
40 |
## Model description
|
41 |
|
@@ -146,22 +146,22 @@ The following hyperparameters were used during training:
|
|
146 |
|
147 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
148 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
149 |
-
| 0.
|
150 |
-
| 0.
|
151 |
-
| 0.
|
152 |
-
| 0.
|
153 |
-
| 0.
|
154 |
-
| 0.
|
155 |
-
| 0.
|
156 |
-
| 0.
|
157 |
-
| 0.
|
158 |
-
| 0.
|
159 |
-
| 0.
|
160 |
-
| 0.
|
161 |
-
| 0.
|
162 |
-
| 0.
|
163 |
-
| 0.
|
164 |
-
| 0.
|
165 |
|
166 |
|
167 |
### Framework versions
|
|
|
27 |
The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
|
28 |
|
29 |
It achieves the following results on the evaluation set:
|
30 |
+
- Loss: 0.4537
|
31 |
+
- Rewards/chosen: -0.0837
|
32 |
+
- Rewards/rejected: -1.2628
|
33 |
+
- Rewards/accuracies: 0.8301
|
34 |
+
- Rewards/margins: 1.1791
|
35 |
+
- Logps/rejected: -224.8409
|
36 |
+
- Logps/chosen: -203.2228
|
37 |
+
- Logits/rejected: 0.4773
|
38 |
+
- Logits/chosen: 0.3062
|
39 |
|
40 |
## Model description
|
41 |
|
|
|
146 |
|
147 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
148 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
149 |
+
| 0.6853 | 0.06 | 20 | 0.6701 | 0.0133 | -0.0368 | 0.6905 | 0.0501 | -212.5803 | -202.2522 | 0.3853 | 0.2532 |
|
150 |
+
| 0.6312 | 0.12 | 40 | 0.5884 | 0.0422 | -0.2208 | 0.8138 | 0.2630 | -214.4207 | -201.9638 | 0.4254 | 0.2816 |
|
151 |
+
| 0.547 | 0.19 | 60 | 0.5146 | 0.0172 | -0.5786 | 0.8278 | 0.5958 | -217.9983 | -202.2132 | 0.4699 | 0.3110 |
|
152 |
+
| 0.4388 | 0.25 | 80 | 0.4893 | -0.0808 | -1.0789 | 0.8293 | 0.9981 | -223.0014 | -203.1934 | 0.5158 | 0.3396 |
|
153 |
+
| 0.4871 | 0.31 | 100 | 0.4818 | -0.1298 | -1.2346 | 0.8297 | 1.1048 | -224.5586 | -203.6837 | 0.5133 | 0.3340 |
|
154 |
+
| 0.4863 | 0.37 | 120 | 0.4723 | -0.1230 | -1.1718 | 0.8301 | 1.0488 | -223.9305 | -203.6159 | 0.4910 | 0.3167 |
|
155 |
+
| 0.4578 | 0.44 | 140 | 0.4666 | -0.1257 | -1.1772 | 0.8301 | 1.0515 | -223.9844 | -203.6428 | 0.4795 | 0.3078 |
|
156 |
+
| 0.4587 | 0.5 | 160 | 0.4625 | -0.0746 | -1.1272 | 0.8301 | 1.0526 | -223.4841 | -203.1310 | 0.4857 | 0.3139 |
|
157 |
+
| 0.4688 | 0.56 | 180 | 0.4595 | -0.0584 | -1.1194 | 0.8297 | 1.0610 | -223.4062 | -202.9692 | 0.4890 | 0.3171 |
|
158 |
+
| 0.4189 | 0.62 | 200 | 0.4579 | -0.0666 | -1.1647 | 0.8297 | 1.0982 | -223.8598 | -203.0511 | 0.4858 | 0.3138 |
|
159 |
+
| 0.4392 | 0.68 | 220 | 0.4564 | -0.0697 | -1.1915 | 0.8301 | 1.1219 | -224.1278 | -203.0823 | 0.4824 | 0.3110 |
|
160 |
+
| 0.4659 | 0.75 | 240 | 0.4554 | -0.0826 | -1.2245 | 0.8301 | 1.1419 | -224.4574 | -203.2112 | 0.4761 | 0.3052 |
|
161 |
+
| 0.4075 | 0.81 | 260 | 0.4544 | -0.0823 | -1.2328 | 0.8301 | 1.1504 | -224.5403 | -203.2089 | 0.4749 | 0.3044 |
|
162 |
+
| 0.4015 | 0.87 | 280 | 0.4543 | -0.0833 | -1.2590 | 0.8301 | 1.1757 | -224.8026 | -203.2188 | 0.4779 | 0.3067 |
|
163 |
+
| 0.4365 | 0.93 | 300 | 0.4539 | -0.0846 | -1.2658 | 0.8301 | 1.1812 | -224.8702 | -203.2313 | 0.4780 | 0.3067 |
|
164 |
+
| 0.4589 | 1.0 | 320 | 0.4537 | -0.0837 | -1.2628 | 0.8301 | 1.1791 | -224.8409 | -203.2228 | 0.4773 | 0.3062 |
|
165 |
|
166 |
|
167 |
### Framework versions
|