Weni
/

WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.4-DPO

+---
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
+model-index:
+- name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.4-DPO
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.4-DPO
+This model is a fine-tuned version of [Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged](https://huggingface.co/Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.4296
+- Rewards/chosen: 2.1700
+- Rewards/rejected: -0.6894
+- Rewards/accuracies: 0.4286
+- Rewards/margins: 2.8595
+- Logps/rejected: -98.0954
+- Logps/chosen: -47.9682
+- Logits/rejected: -1.8433
+- Logits/chosen: -1.8191
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.03
+- training_steps: 732
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.665         | 0.49  | 30   | 0.6322          | 0.1457         | 0.0118           | 0.4286             | 0.1339          | -95.7578       | -54.7160     | -1.7981         | -1.7801       |
+| 0.5445        | 0.98  | 60   | 0.5272          | 0.4942         | 0.0186           | 0.4286             | 0.4756          | -95.7351       | -53.5543     | -1.8031         | -1.7844       |
+| 0.4616        | 1.46  | 90   | 0.4622          | 0.9572         | -0.0104          | 0.4286             | 0.9676          | -95.8320       | -52.0111     | -1.8099         | -1.7902       |
+| 0.5327        | 1.95  | 120  | 0.4424          | 1.3376         | -0.0803          | 0.4286             | 1.4179          | -96.0649       | -50.7429     | -1.8171         | -1.7963       |
+| 0.5459        | 2.44  | 150  | 0.4335          | 1.6435         | -0.1846          | 0.4286             | 1.8281          | -96.4125       | -49.7233     | -1.8243         | -1.8025       |
+| 0.4055        | 2.93  | 180  | 0.4326          | 1.8624         | -0.3390          | 0.4286             | 2.2014          | -96.9273       | -48.9936     | -1.8301         | -1.8074       |
+| 0.4694        | 3.41  | 210  | 0.4311          | 1.9971         | -0.4435          | 0.4286             | 2.4406          | -97.2756       | -48.5445     | -1.8368         | -1.8136       |
+| 0.5431        | 3.9   | 240  | 0.4247          | 2.0881         | -0.5490          | 0.4286             | 2.6371          | -97.6273       | -48.2414     | -1.8401         | -1.8164       |
+| 0.4547        | 4.39  | 270  | 0.4296          | 2.1700         | -0.6894          | 0.4286             | 2.8595          | -98.0954       | -47.9682     | -1.8433         | -1.8191       |
+| 0.3606        | 4.88  | 300  | 0.4290          | 2.2236         | -0.7919          | 0.4286             | 3.0155          | -98.4369       | -47.7897     | -1.8460         | -1.8213       |
+| 0.4021        | 5.37  | 330  | 0.4302          | 2.2553         | -0.9243          | 0.4286             | 3.1797          | -98.8783       | -47.6839     | -1.8471         | -1.8219       |
+| 0.419         | 5.85  | 360  | 0.4336          | 2.2579         | -1.0063          | 0.4286             | 3.2642          | -99.1514       | -47.6751     | -1.8470         | -1.8214       |
+| 0.3984        | 6.34  | 390  | 0.4291          | 2.2716         | -1.0712          | 0.4286             | 3.3428          | -99.3678       | -47.6296     | -1.8499         | -1.8243       |
+| 0.435         | 6.83  | 420  | 0.4285          | 2.2724         | -1.1240          | 0.4286             | 3.3965          | -99.5441       | -47.6268     | -1.8495         | -1.8236       |
+| 0.5148        | 7.32  | 450  | 0.4309          | 2.2693         | -1.2113          | 0.4286             | 3.4806          | -99.8349       | -47.6373     | -1.8482         | -1.8220       |
+| 0.412         | 7.8   | 480  | 0.4308          | 2.2647         | -1.2626          | 0.4286             | 3.5273          | -100.0060      | -47.6527     | -1.8481         | -1.8217       |
+| 0.4911        | 8.29  | 510  | 0.4331          | 2.2554         | -1.3097          | 0.4286             | 3.5651          | -100.1629      | -47.6835     | -1.8466         | -1.8200       |
+| 0.4433        | 8.78  | 540  | 0.4317          | 2.2453         | -1.3403          | 0.4286             | 3.5855          | -100.2648      | -47.7174     | -1.8468         | -1.8201       |
+| 0.3813        | 9.27  | 570  | 0.4338          | 2.2396         | -1.3854          | 0.4286             | 3.6250          | -100.4154      | -47.7365     | -1.8473         | -1.8205       |
+| 0.5026        | 9.76  | 600  | 0.4333          | 2.2386         | -1.4022          | 0.4286             | 3.6408          | -100.4712      | -47.7397     | -1.8472         | -1.8203       |
+| 0.3121        | 10.24 | 630  | 0.4324          | 2.2339         | -1.4158          | 0.4286             | 3.6497          | -100.5166      | -47.7553     | -1.8462         | -1.8193       |
+| 0.4165        | 10.73 | 660  | 0.4319          | 2.2307         | -1.4318          | 0.4286             | 3.6625          | -100.5699      | -47.7659     | -1.8462         | -1.8193       |
+| 0.5328        | 11.22 | 690  | 0.4329          | 2.2254         | -1.4478          | 0.4286             | 3.6732          | -100.6233      | -47.7837     | -1.8457         | -1.8186       |
+| 0.4046        | 11.71 | 720  | 0.4335          | 2.2229         | -1.4565          | 0.4286             | 3.6793          | -100.6521      | -47.7921     | -1.8454         | -1.8183       |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.38.2
+- Pytorch 2.1.0+cu118
+- Datasets 2.18.0
+- Tokenizers 0.15.2

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2019e7c2d0f74845a62bb36150fa64d108d81519b8f3e5b08838aa277d901f4
 size 13648432

 version https://git-lfs.github.com/spec/v1
+oid sha256:4fc4205bc93d6795cf17821f662a891c4d28c9b793aea2722bad767836694c35
 size 13648432