beamaia's picture
Model save
8fa8ac4 verified
|
raw
history blame
4 kB
metadata
license: mit
library_name: peft
tags:
  - trl
  - kto
  - generated_from_trainer
base_model: HuggingFaceH4/zephyr-7b-beta
model-index:
  - name: WeniGPT-QA-Zephyr-7B-5.0.0-KTO
    results: []

WeniGPT-QA-Zephyr-7B-5.0.0-KTO

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0062
  • Rewards/chosen: 6.6430
  • Rewards/rejected: -36.7537
  • Rewards/margins: 43.3967
  • Kl: 0.1669
  • Logps/chosen: -144.6907
  • Logps/rejected: -566.5795

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 786
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/margins Kl Logps/chosen Logps/rejected
0.1437 0.38 50 0.0282 5.2842 -20.1101 25.3943 0.0961 -158.2786 -400.1437
0.0615 0.76 100 0.0222 5.7502 -18.4500 24.2003 0.5886 -153.6186 -383.5430
0.0346 1.14 150 0.0398 4.8839 -41.3691 46.2529 0.3036 -162.2825 -612.7335
0.0563 1.52 200 0.0212 6.1746 -26.4848 32.6594 0.1584 -149.3753 -463.8907
0.0533 1.9 250 0.0134 6.1913 -29.0566 35.2479 0.4595 -149.2076 -489.6085
0.0076 2.28 300 0.0161 6.3153 -25.4861 31.8015 0.6193 -147.9676 -453.9040
0.011 2.66 350 0.0120 6.3302 -37.6836 44.0138 0.4913 -147.8187 -575.8787
0.0049 3.04 400 0.0102 6.3273 -29.9323 36.2596 0.4649 -147.8484 -498.3662
0.0028 3.42 450 0.0083 6.5215 -34.1028 40.6243 0.2949 -145.9056 -540.0707
0.0087 3.8 500 0.0096 6.4117 -35.2134 41.6251 0.0923 -147.0044 -551.1769
0.004 4.18 550 0.0075 6.5708 -37.6298 44.2006 0.1574 -145.4131 -575.3412
0.0036 4.56 600 0.0068 6.6432 -36.6865 43.3297 0.1629 -144.6893 -565.9077
0.003 4.94 650 0.0064 6.6633 -36.7249 43.3882 0.1661 -144.4881 -566.2917
0.0016 5.32 700 0.0062 6.6430 -36.7537 43.3967 0.1669 -144.6907 -566.5795
0.0042 5.7 750 0.0062 6.6553 -36.6367 43.2920 0.1671 -144.5682 -565.4096

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.1
  • Pytorch 2.1.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.2