Edit model card

WeniGPT-Agents-Mistral-1.0.0-SFT-1.0.22-DPO

This model is a fine-tuned version of Weni/WeniGPT-Agents-Mistral-1.0.0-SFT-merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1118
  • Rewards/chosen: 1.8747
  • Rewards/rejected: -0.9846
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.8593
  • Logps/rejected: -241.0612
  • Logps/chosen: -117.5944
  • Logits/rejected: -1.8155
  • Logits/chosen: -1.8108

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 180
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5055 0.9677 30 0.4410 0.6284 -0.0380 1.0 0.6664 -231.5955 -130.0576 -1.8176 -1.8109
0.2624 1.9355 60 0.2709 1.2835 -0.2282 0.8571 1.5117 -233.4975 -123.5067 -1.8171 -1.8112
0.2034 2.9032 90 0.1831 1.6577 -0.4392 0.8571 2.0969 -235.6068 -119.7638 -1.8201 -1.8148
0.1597 3.8710 120 0.1410 1.8326 -0.6890 1.0 2.5216 -238.1049 -118.0154 -1.8244 -1.8193
0.1056 4.8387 150 0.1193 1.8704 -0.8951 1.0 2.7655 -240.1666 -117.6375 -1.8182 -1.8134
0.1218 5.8065 180 0.1118 1.8747 -0.9846 1.0 2.8593 -241.0612 -117.5944 -1.8155 -1.8108

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Weni/WeniGPT-Agents-Mistral-1.0.0-SFT-1.0.22-DPO

Adapter
(2)
this model