GaetanMichelet
/

Llama-31-8B_task-2_60-samples_config-3

alignment-handbook

Generated from Trainer

4-bit precision

Model card Files Files and versions Metrics Training metrics Community

Llama-31-8B_task-2_60-samples_config-3 / README.md

GaetanMichelet's picture

End of training

9cc3afa verified about 1 month ago

|

history blame contribute delete

No virus

3.45 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	datasets:
	- GaetanMichelet/chat-60_ft_task-2
	library_name: peft
	license: llama3.1
	tags:
	- alignment-handbook
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: Llama-31-8B_task-2_60-samples_config-3
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Llama-31-8B_task-2_60-samples_config-3

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the GaetanMichelet/chat-60_ft_task-2 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7156

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 150

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:----:\|:---------------:\|
	\| 1.0751 \| 0.8696 \| 5 \| 1.0960 \|
	\| 1.1004 \| 1.9130 \| 11 \| 1.0923 \|
	\| 1.1552 \| 2.9565 \| 17 \| 1.0845 \|
	\| 1.0884 \| 4.0 \| 23 \| 1.0731 \|
	\| 1.0984 \| 4.8696 \| 28 \| 1.0593 \|
	\| 1.054 \| 5.9130 \| 34 \| 1.0363 \|
	\| 0.9646 \| 6.9565 \| 40 \| 1.0060 \|
	\| 0.9982 \| 8.0 \| 46 \| 0.9700 \|
	\| 0.9649 \| 8.8696 \| 51 \| 0.9380 \|
	\| 0.9161 \| 9.9130 \| 57 \| 0.9017 \|
	\| 0.8966 \| 10.9565 \| 63 \| 0.8722 \|
	\| 0.8314 \| 12.0 \| 69 \| 0.8468 \|
	\| 0.7747 \| 12.8696 \| 74 \| 0.8286 \|
	\| 0.8162 \| 13.9130 \| 80 \| 0.8081 \|
	\| 0.8422 \| 14.9565 \| 86 \| 0.7906 \|
	\| 0.7802 \| 16.0 \| 92 \| 0.7776 \|
	\| 0.7179 \| 16.8696 \| 97 \| 0.7692 \|
	\| 0.7191 \| 17.9130 \| 103 \| 0.7605 \|
	\| 0.6644 \| 18.9565 \| 109 \| 0.7524 \|
	\| 0.6898 \| 20.0 \| 115 \| 0.7456 \|
	\| 0.6776 \| 20.8696 \| 120 \| 0.7404 \|
	\| 0.6571 \| 21.9130 \| 126 \| 0.7338 \|
	\| 0.6177 \| 22.9565 \| 132 \| 0.7289 \|
	\| 0.6361 \| 24.0 \| 138 \| 0.7246 \|
	\| 0.6357 \| 24.8696 \| 143 \| 0.7214 \|
	\| 0.6767 \| 25.9130 \| 149 \| 0.7174 \|
	\| 0.5947 \| 26.9565 \| 155 \| 0.7170 \|
	\| 0.6182 \| 28.0 \| 161 \| 0.7156 \|
	\| 0.5899 \| 28.8696 \| 166 \| 0.7157 \|
	\| 0.5612 \| 29.9130 \| 172 \| 0.7162 \|
	\| 0.5545 \| 30.9565 \| 178 \| 0.7185 \|
	\| 0.5574 \| 32.0 \| 184 \| 0.7232 \|
	\| 0.5316 \| 32.8696 \| 189 \| 0.7254 \|
	\| 0.5276 \| 33.9130 \| 195 \| 0.7338 \|
	\| 0.4653 \| 34.9565 \| 201 \| 0.7407 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.1.2+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1