End of training

9508993 verified about 2 months ago

5.81 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: IE_M2_1000steps_1e7rate_01beta_cSFTDPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# IE_M2_1000steps_1e7rate_01beta_cSFTDPO

	This model is a fine-tuned version of [tsavage68/IE_M2_1000steps_1e7rate_SFT](https://huggingface.co/tsavage68/IE_M2_1000steps_1e7rate_SFT) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3743
	- Rewards/chosen: -0.3291
	- Rewards/rejected: -6.1017
	- Rewards/accuracies: 0.4600
	- Rewards/margins: 5.7727
	- Logps/rejected: -102.0393
	- Logps/chosen: -45.4965
	- Logits/rejected: -2.8684
	- Logits/chosen: -2.8050

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-07
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.558 \| 0.4 \| 50 \| 0.4553 \| -0.0349 \| -0.8002 \| 0.4600 \| 0.7653 \| -49.0237 \| -42.5545 \| -2.9038 \| -2.8422 \|
	\| 0.3818 \| 0.8 \| 100 \| 0.3747 \| -0.1730 \| -3.5887 \| 0.4600 \| 3.4157 \| -76.9091 \| -43.9359 \| -2.8759 \| -2.8145 \|
	\| 0.3123 \| 1.2 \| 150 \| 0.3744 \| -0.2403 \| -4.3676 \| 0.4600 \| 4.1273 \| -84.6980 \| -44.6088 \| -2.8742 \| -2.8132 \|
	\| 0.364 \| 1.6 \| 200 \| 0.3744 \| -0.2016 \| -4.5800 \| 0.4600 \| 4.3784 \| -86.8216 \| -44.2215 \| -2.8745 \| -2.8130 \|
	\| 0.4332 \| 2.0 \| 250 \| 0.3743 \| -0.2684 \| -4.8731 \| 0.4600 \| 4.6046 \| -89.7525 \| -44.8898 \| -2.8737 \| -2.8118 \|
	\| 0.3986 \| 2.4 \| 300 \| 0.3743 \| -0.1931 \| -5.0362 \| 0.4600 \| 4.8430 \| -91.3835 \| -44.1367 \| -2.8747 \| -2.8125 \|
	\| 0.3986 \| 2.8 \| 350 \| 0.3743 \| -0.1846 \| -5.1505 \| 0.4600 \| 4.9659 \| -92.5268 \| -44.0517 \| -2.8745 \| -2.8120 \|
	\| 0.4506 \| 3.2 \| 400 \| 0.3743 \| -0.1881 \| -5.2928 \| 0.4600 \| 5.1047 \| -93.9497 \| -44.0868 \| -2.8736 \| -2.8107 \|
	\| 0.4505 \| 3.6 \| 450 \| 0.3743 \| -0.2250 \| -5.5587 \| 0.4600 \| 5.3337 \| -96.6092 \| -44.4557 \| -2.8724 \| -2.8094 \|
	\| 0.4332 \| 4.0 \| 500 \| 0.3743 \| -0.4284 \| -5.9879 \| 0.4600 \| 5.5595 \| -100.9007 \| -46.4892 \| -2.8698 \| -2.8066 \|
	\| 0.3292 \| 4.4 \| 550 \| 0.3743 \| -0.3669 \| -5.9892 \| 0.4600 \| 5.6223 \| -100.9135 \| -45.8741 \| -2.8695 \| -2.8063 \|
	\| 0.3639 \| 4.8 \| 600 \| 0.3743 \| -0.2855 \| -5.9594 \| 0.4600 \| 5.6739 \| -100.6163 \| -45.0607 \| -2.8699 \| -2.8066 \|
	\| 0.4505 \| 5.2 \| 650 \| 0.3743 \| -0.3591 \| -6.0896 \| 0.4600 \| 5.7305 \| -101.9183 \| -45.7970 \| -2.8685 \| -2.8052 \|
	\| 0.4505 \| 5.6 \| 700 \| 0.3743 \| -0.3292 \| -6.0868 \| 0.4600 \| 5.7576 \| -101.8900 \| -45.4977 \| -2.8687 \| -2.8054 \|
	\| 0.3639 \| 6.0 \| 750 \| 0.3743 \| -0.3284 \| -6.1008 \| 0.4600 \| 5.7724 \| -102.0299 \| -45.4898 \| -2.8683 \| -2.8049 \|
	\| 0.2426 \| 6.4 \| 800 \| 0.3743 \| -0.3283 \| -6.0983 \| 0.4600 \| 5.7700 \| -102.0044 \| -45.4881 \| -2.8684 \| -2.8051 \|
	\| 0.5025 \| 6.8 \| 850 \| 0.3743 \| -0.3251 \| -6.0987 \| 0.4600 \| 5.7737 \| -102.0092 \| -45.4562 \| -2.8685 \| -2.8051 \|
	\| 0.3119 \| 7.2 \| 900 \| 0.3743 \| -0.3297 \| -6.1009 \| 0.4600 \| 5.7712 \| -102.0308 \| -45.5028 \| -2.8684 \| -2.8050 \|
	\| 0.3466 \| 7.6 \| 950 \| 0.3743 \| -0.3291 \| -6.1017 \| 0.4600 \| 5.7727 \| -102.0393 \| -45.4965 \| -2.8684 \| -2.8050 \|
	\| 0.3812 \| 8.0 \| 1000 \| 0.3743 \| -0.3291 \| -6.1017 \| 0.4600 \| 5.7727 \| -102.0393 \| -45.4965 \| -2.8684 \| -2.8050 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.0.0+cu117
	- Datasets 3.0.0
	- Tokenizers 0.19.1