metadata

license: mit
library_name: peft
tags:
  - trl
  - kto
  - KTO
  - WeniGPT
  - generated_from_trainer
base_model: HuggingFaceH4/zephyr-7b-beta
model-index:
  - name: kto-test
    results: []

kto-test

This model is a fine-tuned version of HuggingFaceH4/zephyr-7b-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0147
Rewards/chosen: 5.6143
Rewards/rejected: -31.0540
Rewards/margins: 36.6683
Kl: 0.0
Logps/chosen: -130.3461
Logps/rejected: -503.4655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
training_steps: 786
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/margins	Kl	Logps/chosen	Logps/rejected
371.0873	0.38	50	0.0440	4.6808	-9.5189	14.1997	0.0150	-139.6814	-288.1148
57.9834	0.76	100	0.0275	5.1394	-31.8945	37.0339	0.0	-135.0947	-511.8704
37.3685	1.14	150	0.0196	5.2556	-27.1934	32.4491	0.0	-133.9325	-464.8599
3.6561	1.52	200	0.0162	5.4306	-22.6310	28.0615	0.0	-132.1833	-419.2354
59.5367	1.9	250	0.0143	5.7355	-31.1619	36.8974	0.0	-129.1339	-504.5448
13.1891	2.29	300	0.0147	5.6143	-31.0540	36.6683	0.0	-130.3461	-503.4655
3.8532	2.67	350	0.0131	5.8860	-26.4154	32.3014	0.0	-127.6289	-457.0801
3.7678	3.05	400	0.0162	5.9318	-26.7524	32.6841	0.0	-127.1711	-460.4493
49.3456	3.43	450	0.0167	5.9252	-28.7033	34.6286	0.0	-127.2365	-479.9590
12.2886	3.81	500	0.0164	6.0009	-29.4493	35.4501	0.0	-126.4803	-487.4185
2.3745	4.19	550	0.0173	6.0124	-29.9808	35.9932	0.0	-126.3649	-492.7338
0.46	4.57	600	0.0173	6.0060	-30.4606	36.4666	0.0	-126.4293	-497.5318
7.7723	4.95	650	0.0180	6.0079	-30.7030	36.7109	0.0	-126.4096	-499.9554
4.1333	5.33	700	0.0184	6.0037	-30.8948	36.8984	0.0	-126.4521	-501.8734
1.6938	5.71	750	0.0183	6.0119	-30.9672	36.9791	0.0	-126.3704	-502.5979

Framework versions

PEFT 0.10.0
Transformers 4.39.1
Pytorch 2.1.0+cu118
Datasets 2.18.0
Tokenizers 0.15.1