End of training

8ab9a7b verified 3 months ago

4.62 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets:
	- GaetanMichelet/chat-60_ft_task-2
	library_name: peft
	license: llama3.1
	tags:
	- alignment-handbook
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: Llama-31-8B_task-2_60-samples_config-4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-31-8B_task-2_60-samples_config-4

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the GaetanMichelet/chat-60_ft_task-2 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7166

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 150

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:----:\|:---------------:\|
	\| 1.0749 \| 0.6957 \| 2 \| 1.0966 \|
	\| 1.0739 \| 1.7391 \| 5 \| 1.0942 \|
	\| 1.0883 \| 2.7826 \| 8 \| 1.0905 \|
	\| 1.0572 \| 3.8261 \| 11 \| 1.0844 \|
	\| 1.0814 \| 4.8696 \| 14 \| 1.0741 \|
	\| 1.0423 \| 5.9130 \| 17 \| 1.0622 \|
	\| 1.0626 \| 6.9565 \| 20 \| 1.0462 \|
	\| 1.0118 \| 8.0 \| 23 \| 1.0248 \|
	\| 1.0176 \| 8.6957 \| 25 \| 1.0099 \|
	\| 0.9728 \| 9.7391 \| 28 \| 0.9822 \|
	\| 0.9567 \| 10.7826 \| 31 \| 0.9527 \|
	\| 0.9202 \| 11.8261 \| 34 \| 0.9259 \|
	\| 0.9099 \| 12.8696 \| 37 \| 0.9015 \|
	\| 0.8806 \| 13.9130 \| 40 \| 0.8828 \|
	\| 0.7975 \| 14.9565 \| 43 \| 0.8661 \|
	\| 0.8572 \| 16.0 \| 46 \| 0.8533 \|
	\| 0.8342 \| 16.6957 \| 48 \| 0.8447 \|
	\| 0.8242 \| 17.7391 \| 51 \| 0.8331 \|
	\| 0.7954 \| 18.7826 \| 54 \| 0.8223 \|
	\| 0.8235 \| 19.8261 \| 57 \| 0.8122 \|
	\| 0.7896 \| 20.8696 \| 60 \| 0.8017 \|
	\| 0.7775 \| 21.9130 \| 63 \| 0.7933 \|
	\| 0.7315 \| 22.9565 \| 66 \| 0.7862 \|
	\| 0.7702 \| 24.0 \| 69 \| 0.7800 \|
	\| 0.7262 \| 24.6957 \| 71 \| 0.7756 \|
	\| 0.7683 \| 25.7391 \| 74 \| 0.7715 \|
	\| 0.7043 \| 26.7826 \| 77 \| 0.7656 \|
	\| 0.7314 \| 27.8261 \| 80 \| 0.7621 \|
	\| 0.7093 \| 28.8696 \| 83 \| 0.7586 \|
	\| 0.7047 \| 29.9130 \| 86 \| 0.7542 \|
	\| 0.707 \| 30.9565 \| 89 \| 0.7506 \|
	\| 0.7128 \| 32.0 \| 92 \| 0.7475 \|
	\| 0.676 \| 32.6957 \| 94 \| 0.7451 \|
	\| 0.7113 \| 33.7391 \| 97 \| 0.7420 \|
	\| 0.6733 \| 34.7826 \| 100 \| 0.7396 \|
	\| 0.698 \| 35.8261 \| 103 \| 0.7370 \|
	\| 0.6868 \| 36.8696 \| 106 \| 0.7339 \|
	\| 0.6633 \| 37.9130 \| 109 \| 0.7310 \|
	\| 0.675 \| 38.9565 \| 112 \| 0.7296 \|
	\| 0.6563 \| 40.0 \| 115 \| 0.7270 \|
	\| 0.64 \| 40.6957 \| 117 \| 0.7257 \|
	\| 0.6314 \| 41.7391 \| 120 \| 0.7242 \|
	\| 0.619 \| 42.7826 \| 123 \| 0.7225 \|
	\| 0.6256 \| 43.8261 \| 126 \| 0.7211 \|
	\| 0.634 \| 44.8696 \| 129 \| 0.7198 \|
	\| 0.5984 \| 45.9130 \| 132 \| 0.7185 \|
	\| 0.636 \| 46.9565 \| 135 \| 0.7176 \|
	\| 0.6084 \| 48.0 \| 138 \| 0.7173 \|
	\| 0.6068 \| 48.6957 \| 140 \| 0.7168 \|
	\| 0.5982 \| 49.7391 \| 143 \| 0.7166 \|
	\| 0.6024 \| 50.7826 \| 146 \| 0.7171 \|
	\| 0.5876 \| 51.8261 \| 149 \| 0.7170 \|
	\| 0.5852 \| 52.8696 \| 152 \| 0.7169 \|
	\| 0.5803 \| 53.9130 \| 155 \| 0.7175 \|
	\| 0.5794 \| 54.9565 \| 158 \| 0.7172 \|
	\| 0.5699 \| 56.0 \| 161 \| 0.7188 \|
	\| 0.5722 \| 56.6957 \| 163 \| 0.7192 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets:
	- GaetanMichelet/chat-60_ft_task-2
	library_name: peft
	license: llama3.1
	tags:
	- alignment-handbook
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: Llama-31-8B_task-2_60-samples_config-4
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-31-8B_task-2_60-samples_config-4

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the GaetanMichelet/chat-60_ft_task-2 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7166

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 150

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:----:\|:---------------:\|
	\| 1.0749 \| 0.6957 \| 2 \| 1.0966 \|
	\| 1.0739 \| 1.7391 \| 5 \| 1.0942 \|
	\| 1.0883 \| 2.7826 \| 8 \| 1.0905 \|
	\| 1.0572 \| 3.8261 \| 11 \| 1.0844 \|
	\| 1.0814 \| 4.8696 \| 14 \| 1.0741 \|
	\| 1.0423 \| 5.9130 \| 17 \| 1.0622 \|
	\| 1.0626 \| 6.9565 \| 20 \| 1.0462 \|
	\| 1.0118 \| 8.0 \| 23 \| 1.0248 \|
	\| 1.0176 \| 8.6957 \| 25 \| 1.0099 \|
	\| 0.9728 \| 9.7391 \| 28 \| 0.9822 \|
	\| 0.9567 \| 10.7826 \| 31 \| 0.9527 \|
	\| 0.9202 \| 11.8261 \| 34 \| 0.9259 \|
	\| 0.9099 \| 12.8696 \| 37 \| 0.9015 \|
	\| 0.8806 \| 13.9130 \| 40 \| 0.8828 \|
	\| 0.7975 \| 14.9565 \| 43 \| 0.8661 \|
	\| 0.8572 \| 16.0 \| 46 \| 0.8533 \|
	\| 0.8342 \| 16.6957 \| 48 \| 0.8447 \|
	\| 0.8242 \| 17.7391 \| 51 \| 0.8331 \|
	\| 0.7954 \| 18.7826 \| 54 \| 0.8223 \|
	\| 0.8235 \| 19.8261 \| 57 \| 0.8122 \|
	\| 0.7896 \| 20.8696 \| 60 \| 0.8017 \|
	\| 0.7775 \| 21.9130 \| 63 \| 0.7933 \|
	\| 0.7315 \| 22.9565 \| 66 \| 0.7862 \|
	\| 0.7702 \| 24.0 \| 69 \| 0.7800 \|
	\| 0.7262 \| 24.6957 \| 71 \| 0.7756 \|
	\| 0.7683 \| 25.7391 \| 74 \| 0.7715 \|
	\| 0.7043 \| 26.7826 \| 77 \| 0.7656 \|
	\| 0.7314 \| 27.8261 \| 80 \| 0.7621 \|
	\| 0.7093 \| 28.8696 \| 83 \| 0.7586 \|
	\| 0.7047 \| 29.9130 \| 86 \| 0.7542 \|
	\| 0.707 \| 30.9565 \| 89 \| 0.7506 \|
	\| 0.7128 \| 32.0 \| 92 \| 0.7475 \|
	\| 0.676 \| 32.6957 \| 94 \| 0.7451 \|
	\| 0.7113 \| 33.7391 \| 97 \| 0.7420 \|
	\| 0.6733 \| 34.7826 \| 100 \| 0.7396 \|
	\| 0.698 \| 35.8261 \| 103 \| 0.7370 \|
	\| 0.6868 \| 36.8696 \| 106 \| 0.7339 \|
	\| 0.6633 \| 37.9130 \| 109 \| 0.7310 \|
	\| 0.675 \| 38.9565 \| 112 \| 0.7296 \|
	\| 0.6563 \| 40.0 \| 115 \| 0.7270 \|
	\| 0.64 \| 40.6957 \| 117 \| 0.7257 \|
	\| 0.6314 \| 41.7391 \| 120 \| 0.7242 \|
	\| 0.619 \| 42.7826 \| 123 \| 0.7225 \|
	\| 0.6256 \| 43.8261 \| 126 \| 0.7211 \|
	\| 0.634 \| 44.8696 \| 129 \| 0.7198 \|
	\| 0.5984 \| 45.9130 \| 132 \| 0.7185 \|
	\| 0.636 \| 46.9565 \| 135 \| 0.7176 \|
	\| 0.6084 \| 48.0 \| 138 \| 0.7173 \|
	\| 0.6068 \| 48.6957 \| 140 \| 0.7168 \|
	\| 0.5982 \| 49.7391 \| 143 \| 0.7166 \|
	\| 0.6024 \| 50.7826 \| 146 \| 0.7171 \|
	\| 0.5876 \| 51.8261 \| 149 \| 0.7170 \|
	\| 0.5852 \| 52.8696 \| 152 \| 0.7169 \|
	\| 0.5803 \| 53.9130 \| 155 \| 0.7175 \|
	\| 0.5794 \| 54.9565 \| 158 \| 0.7172 \|
	\| 0.5699 \| 56.0 \| 161 \| 0.7188 \|
	\| 0.5722 \| 56.6957 \| 163 \| 0.7192 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1