ironrock commited on
Commit
2ae333a
1 Parent(s): 6f95d02

Model save

Browse files
Files changed (2) hide show
  1. README.md +95 -0
  2. adapter_model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
8
+ model-index:
9
+ - name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.4-DPO
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.4-DPO
17
+
18
+ This model is a fine-tuned version of [Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged](https://huggingface.co/Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.4296
21
+ - Rewards/chosen: 2.1700
22
+ - Rewards/rejected: -0.6894
23
+ - Rewards/accuracies: 0.4286
24
+ - Rewards/margins: 2.8595
25
+ - Logps/rejected: -98.0954
26
+ - Logps/chosen: -47.9682
27
+ - Logits/rejected: -1.8433
28
+ - Logits/chosen: -1.8191
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-06
48
+ - train_batch_size: 2
49
+ - eval_batch_size: 2
50
+ - seed: 42
51
+ - gradient_accumulation_steps: 2
52
+ - total_train_batch_size: 4
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_ratio: 0.03
56
+ - training_steps: 732
57
+ - mixed_precision_training: Native AMP
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.665 | 0.49 | 30 | 0.6322 | 0.1457 | 0.0118 | 0.4286 | 0.1339 | -95.7578 | -54.7160 | -1.7981 | -1.7801 |
64
+ | 0.5445 | 0.98 | 60 | 0.5272 | 0.4942 | 0.0186 | 0.4286 | 0.4756 | -95.7351 | -53.5543 | -1.8031 | -1.7844 |
65
+ | 0.4616 | 1.46 | 90 | 0.4622 | 0.9572 | -0.0104 | 0.4286 | 0.9676 | -95.8320 | -52.0111 | -1.8099 | -1.7902 |
66
+ | 0.5327 | 1.95 | 120 | 0.4424 | 1.3376 | -0.0803 | 0.4286 | 1.4179 | -96.0649 | -50.7429 | -1.8171 | -1.7963 |
67
+ | 0.5459 | 2.44 | 150 | 0.4335 | 1.6435 | -0.1846 | 0.4286 | 1.8281 | -96.4125 | -49.7233 | -1.8243 | -1.8025 |
68
+ | 0.4055 | 2.93 | 180 | 0.4326 | 1.8624 | -0.3390 | 0.4286 | 2.2014 | -96.9273 | -48.9936 | -1.8301 | -1.8074 |
69
+ | 0.4694 | 3.41 | 210 | 0.4311 | 1.9971 | -0.4435 | 0.4286 | 2.4406 | -97.2756 | -48.5445 | -1.8368 | -1.8136 |
70
+ | 0.5431 | 3.9 | 240 | 0.4247 | 2.0881 | -0.5490 | 0.4286 | 2.6371 | -97.6273 | -48.2414 | -1.8401 | -1.8164 |
71
+ | 0.4547 | 4.39 | 270 | 0.4296 | 2.1700 | -0.6894 | 0.4286 | 2.8595 | -98.0954 | -47.9682 | -1.8433 | -1.8191 |
72
+ | 0.3606 | 4.88 | 300 | 0.4290 | 2.2236 | -0.7919 | 0.4286 | 3.0155 | -98.4369 | -47.7897 | -1.8460 | -1.8213 |
73
+ | 0.4021 | 5.37 | 330 | 0.4302 | 2.2553 | -0.9243 | 0.4286 | 3.1797 | -98.8783 | -47.6839 | -1.8471 | -1.8219 |
74
+ | 0.419 | 5.85 | 360 | 0.4336 | 2.2579 | -1.0063 | 0.4286 | 3.2642 | -99.1514 | -47.6751 | -1.8470 | -1.8214 |
75
+ | 0.3984 | 6.34 | 390 | 0.4291 | 2.2716 | -1.0712 | 0.4286 | 3.3428 | -99.3678 | -47.6296 | -1.8499 | -1.8243 |
76
+ | 0.435 | 6.83 | 420 | 0.4285 | 2.2724 | -1.1240 | 0.4286 | 3.3965 | -99.5441 | -47.6268 | -1.8495 | -1.8236 |
77
+ | 0.5148 | 7.32 | 450 | 0.4309 | 2.2693 | -1.2113 | 0.4286 | 3.4806 | -99.8349 | -47.6373 | -1.8482 | -1.8220 |
78
+ | 0.412 | 7.8 | 480 | 0.4308 | 2.2647 | -1.2626 | 0.4286 | 3.5273 | -100.0060 | -47.6527 | -1.8481 | -1.8217 |
79
+ | 0.4911 | 8.29 | 510 | 0.4331 | 2.2554 | -1.3097 | 0.4286 | 3.5651 | -100.1629 | -47.6835 | -1.8466 | -1.8200 |
80
+ | 0.4433 | 8.78 | 540 | 0.4317 | 2.2453 | -1.3403 | 0.4286 | 3.5855 | -100.2648 | -47.7174 | -1.8468 | -1.8201 |
81
+ | 0.3813 | 9.27 | 570 | 0.4338 | 2.2396 | -1.3854 | 0.4286 | 3.6250 | -100.4154 | -47.7365 | -1.8473 | -1.8205 |
82
+ | 0.5026 | 9.76 | 600 | 0.4333 | 2.2386 | -1.4022 | 0.4286 | 3.6408 | -100.4712 | -47.7397 | -1.8472 | -1.8203 |
83
+ | 0.3121 | 10.24 | 630 | 0.4324 | 2.2339 | -1.4158 | 0.4286 | 3.6497 | -100.5166 | -47.7553 | -1.8462 | -1.8193 |
84
+ | 0.4165 | 10.73 | 660 | 0.4319 | 2.2307 | -1.4318 | 0.4286 | 3.6625 | -100.5699 | -47.7659 | -1.8462 | -1.8193 |
85
+ | 0.5328 | 11.22 | 690 | 0.4329 | 2.2254 | -1.4478 | 0.4286 | 3.6732 | -100.6233 | -47.7837 | -1.8457 | -1.8186 |
86
+ | 0.4046 | 11.71 | 720 | 0.4335 | 2.2229 | -1.4565 | 0.4286 | 3.6793 | -100.6521 | -47.7921 | -1.8454 | -1.8183 |
87
+
88
+
89
+ ### Framework versions
90
+
91
+ - PEFT 0.10.0
92
+ - Transformers 4.38.2
93
+ - Pytorch 2.1.0+cu118
94
+ - Datasets 2.18.0
95
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2019e7c2d0f74845a62bb36150fa64d108d81519b8f3e5b08838aa277d901f4
3
  size 13648432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fc4205bc93d6795cf17821f662a891c4d28c9b793aea2722bad767836694c35
3
  size 13648432