kto-test / README.md
beamaia's picture
Model save
b066703 verified
|
raw
history blame
4.04 kB
metadata
license: mit
library_name: peft
tags:
  - trl
  - kto
  - KTO
  - WeniGPT
  - generated_from_trainer
base_model: HuggingFaceH4/zephyr-7b-beta
model-index:
  - name: kto-test
    results: []

kto-test

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0147
  • Rewards/chosen: 5.6143
  • Rewards/rejected: -31.0540
  • Rewards/margins: 36.6683
  • Kl: 0.0
  • Logps/chosen: -130.3461
  • Logps/rejected: -503.4655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 786
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/margins Kl Logps/chosen Logps/rejected
371.0873 0.38 50 0.0440 4.6808 -9.5189 14.1997 0.0150 -139.6814 -288.1148
57.9834 0.76 100 0.0275 5.1394 -31.8945 37.0339 0.0 -135.0947 -511.8704
37.3685 1.14 150 0.0196 5.2556 -27.1934 32.4491 0.0 -133.9325 -464.8599
3.6561 1.52 200 0.0162 5.4306 -22.6310 28.0615 0.0 -132.1833 -419.2354
59.5367 1.9 250 0.0143 5.7355 -31.1619 36.8974 0.0 -129.1339 -504.5448
13.1891 2.29 300 0.0147 5.6143 -31.0540 36.6683 0.0 -130.3461 -503.4655
3.8532 2.67 350 0.0131 5.8860 -26.4154 32.3014 0.0 -127.6289 -457.0801
3.7678 3.05 400 0.0162 5.9318 -26.7524 32.6841 0.0 -127.1711 -460.4493
49.3456 3.43 450 0.0167 5.9252 -28.7033 34.6286 0.0 -127.2365 -479.9590
12.2886 3.81 500 0.0164 6.0009 -29.4493 35.4501 0.0 -126.4803 -487.4185
2.3745 4.19 550 0.0173 6.0124 -29.9808 35.9932 0.0 -126.3649 -492.7338
0.46 4.57 600 0.0173 6.0060 -30.4606 36.4666 0.0 -126.4293 -497.5318
7.7723 4.95 650 0.0180 6.0079 -30.7030 36.7109 0.0 -126.4096 -499.9554
4.1333 5.33 700 0.0184 6.0037 -30.8948 36.8984 0.0 -126.4521 -501.8734
1.6938 5.71 750 0.0183 6.0119 -30.9672 36.9791 0.0 -126.3704 -502.5979

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.1
  • Pytorch 2.1.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.1