ironrock's picture
Model save
b443144 verified
|
raw
history blame
No virus
4.33 kB
metadata
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
model-index:
  - name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.5-DPO
    results: []

WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.5-DPO

This model is a fine-tuned version of Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4260
  • Rewards/chosen: 0.9172
  • Rewards/rejected: -0.6078
  • Rewards/accuracies: 0.4643
  • Rewards/margins: 1.5251
  • Logps/rejected: -103.4404
  • Logps/chosen: -46.9008
  • Logits/rejected: -1.8652
  • Logits/chosen: -1.8327

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 366
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6635 0.49 30 0.6524 0.0904 0.0036 0.4643 0.0867 -97.3259 -55.1696 -1.8044 -1.7832
0.6026 0.98 60 0.5891 0.2506 0.0024 0.4643 0.2482 -97.3380 -53.5672 -1.8099 -1.7878
0.5387 1.46 90 0.5295 0.4396 -0.0275 0.4643 0.4671 -97.6369 -51.6775 -1.8181 -1.7943
0.6033 1.95 120 0.4960 0.5751 -0.0659 0.4643 0.6410 -98.0210 -50.3219 -1.8261 -1.8009
0.5042 2.44 150 0.4709 0.6967 -0.1479 0.4643 0.8446 -98.8407 -49.1060 -1.8331 -1.8059
0.5087 2.93 180 0.4542 0.7878 -0.2428 0.4643 1.0306 -99.7900 -48.1955 -1.8425 -1.8136
0.4874 3.41 210 0.4428 0.8442 -0.3560 0.4643 1.2002 -100.9220 -47.6315 -1.8520 -1.8219
0.4229 3.9 240 0.4358 0.8750 -0.4390 0.4643 1.3140 -101.7521 -47.3229 -1.8575 -1.8266
0.5295 4.39 270 0.4313 0.9026 -0.4960 0.4643 1.3986 -102.3219 -47.0471 -1.8607 -1.8289
0.5466 4.88 300 0.4291 0.9119 -0.5384 0.4643 1.4503 -102.7461 -46.9544 -1.8629 -1.8309
0.4339 5.37 330 0.4268 0.9152 -0.5900 0.4643 1.5052 -103.2623 -46.9216 -1.8644 -1.8320
0.5438 5.85 360 0.4260 0.9172 -0.6078 0.4643 1.5251 -103.4404 -46.9008 -1.8652 -1.8327

Framework versions

  • PEFT 0.10.0
  • Transformers 4.38.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.2