Weni
/

kto-test

Model card Files Files and versions Community

kto-test / README.md

beamaia's picture

Model save

b066703 verified 8 months ago

|

4.04 kB

	---
	license: mit
	library_name: peft
	tags:
	- trl
	- kto
	- KTO
	- WeniGPT
	- generated_from_trainer
	base_model: HuggingFaceH4/zephyr-7b-beta
	model-index:
	- name: kto-test
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# kto-test

	This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0147
	- Rewards/chosen: 5.6143
	- Rewards/rejected: -31.0540
	- Rewards/margins: 36.6683
	- Kl: 0.0
	- Logps/chosen: -130.3461
	- Logps/rejected: -503.4655

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.03
	- training_steps: 786
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/margins \| Kl \| Logps/chosen \| Logps/rejected \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:---------------:\|:------:\|:------------:\|:--------------:\|
	\| 371.0873 \| 0.38 \| 50 \| 0.0440 \| 4.6808 \| -9.5189 \| 14.1997 \| 0.0150 \| -139.6814 \| -288.1148 \|
	\| 57.9834 \| 0.76 \| 100 \| 0.0275 \| 5.1394 \| -31.8945 \| 37.0339 \| 0.0 \| -135.0947 \| -511.8704 \|
	\| 37.3685 \| 1.14 \| 150 \| 0.0196 \| 5.2556 \| -27.1934 \| 32.4491 \| 0.0 \| -133.9325 \| -464.8599 \|
	\| 3.6561 \| 1.52 \| 200 \| 0.0162 \| 5.4306 \| -22.6310 \| 28.0615 \| 0.0 \| -132.1833 \| -419.2354 \|
	\| 59.5367 \| 1.9 \| 250 \| 0.0143 \| 5.7355 \| -31.1619 \| 36.8974 \| 0.0 \| -129.1339 \| -504.5448 \|
	\| 13.1891 \| 2.29 \| 300 \| 0.0147 \| 5.6143 \| -31.0540 \| 36.6683 \| 0.0 \| -130.3461 \| -503.4655 \|
	\| 3.8532 \| 2.67 \| 350 \| 0.0131 \| 5.8860 \| -26.4154 \| 32.3014 \| 0.0 \| -127.6289 \| -457.0801 \|
	\| 3.7678 \| 3.05 \| 400 \| 0.0162 \| 5.9318 \| -26.7524 \| 32.6841 \| 0.0 \| -127.1711 \| -460.4493 \|
	\| 49.3456 \| 3.43 \| 450 \| 0.0167 \| 5.9252 \| -28.7033 \| 34.6286 \| 0.0 \| -127.2365 \| -479.9590 \|
	\| 12.2886 \| 3.81 \| 500 \| 0.0164 \| 6.0009 \| -29.4493 \| 35.4501 \| 0.0 \| -126.4803 \| -487.4185 \|
	\| 2.3745 \| 4.19 \| 550 \| 0.0173 \| 6.0124 \| -29.9808 \| 35.9932 \| 0.0 \| -126.3649 \| -492.7338 \|
	\| 0.46 \| 4.57 \| 600 \| 0.0173 \| 6.0060 \| -30.4606 \| 36.4666 \| 0.0 \| -126.4293 \| -497.5318 \|
	\| 7.7723 \| 4.95 \| 650 \| 0.0180 \| 6.0079 \| -30.7030 \| 36.7109 \| 0.0 \| -126.4096 \| -499.9554 \|
	\| 4.1333 \| 5.33 \| 700 \| 0.0184 \| 6.0037 \| -30.8948 \| 36.8984 \| 0.0 \| -126.4521 \| -501.8734 \|
	\| 1.6938 \| 5.71 \| 750 \| 0.0183 \| 6.0119 \| -30.9672 \| 36.9791 \| 0.0 \| -126.3704 \| -502.5979 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.1
	- Pytorch 2.1.0+cu118
	- Datasets 2.18.0
	- Tokenizers 0.15.1