metadata

license: mit
library_name: peft
tags:
  - trl
  - kto
  - generated_from_trainer
base_model: HuggingFaceH4/zephyr-7b-beta
model-index:
  - name: WeniGPT-QA-Zephyr-7B-5.0.0-KTO
    results: []

WeniGPT-QA-Zephyr-7B-5.0.0-KTO

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0062
Rewards/chosen: 6.6430
Rewards/rejected: -36.7537
Rewards/margins: 43.3967
Kl: 0.1669
Logps/chosen: -144.6907
Logps/rejected: -566.5795

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
training_steps: 786
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/margins	Kl	Logps/chosen	Logps/rejected
0.1437	0.38	50	0.0282	5.2842	-20.1101	25.3943	0.0961	-158.2786	-400.1437
0.0615	0.76	100	0.0222	5.7502	-18.4500	24.2003	0.5886	-153.6186	-383.5430
0.0346	1.14	150	0.0398	4.8839	-41.3691	46.2529	0.3036	-162.2825	-612.7335
0.0563	1.52	200	0.0212	6.1746	-26.4848	32.6594	0.1584	-149.3753	-463.8907
0.0533	1.9	250	0.0134	6.1913	-29.0566	35.2479	0.4595	-149.2076	-489.6085
0.0076	2.28	300	0.0161	6.3153	-25.4861	31.8015	0.6193	-147.9676	-453.9040
0.011	2.66	350	0.0120	6.3302	-37.6836	44.0138	0.4913	-147.8187	-575.8787
0.0049	3.04	400	0.0102	6.3273	-29.9323	36.2596	0.4649	-147.8484	-498.3662
0.0028	3.42	450	0.0083	6.5215	-34.1028	40.6243	0.2949	-145.9056	-540.0707
0.0087	3.8	500	0.0096	6.4117	-35.2134	41.6251	0.0923	-147.0044	-551.1769
0.004	4.18	550	0.0075	6.5708	-37.6298	44.2006	0.1574	-145.4131	-575.3412
0.0036	4.56	600	0.0068	6.6432	-36.6865	43.3297	0.1629	-144.6893	-565.9077
0.003	4.94	650	0.0064	6.6633	-36.7249	43.3882	0.1661	-144.4881	-566.2917
0.0016	5.32	700	0.0062	6.6430	-36.7537	43.3967	0.1669	-144.6907	-566.5795
0.0042	5.7	750	0.0062	6.6553	-36.6367	43.2920	0.1671	-144.5682	-565.4096

Framework versions

PEFT 0.10.0
Transformers 4.39.1
Pytorch 2.1.0+cu118
Datasets 2.18.0
Tokenizers 0.15.2