Llama-31-8B_task-1_120-samples_config-4_full

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-1 and the GaetanMichelet/chat-120_ft_task-1 datasets. It achieves the following results on the evaluation set:

Loss: 0.9042

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 150

Training results

Training Loss	Epoch	Step	Validation Loss
2.4687	0.9091	5	2.4589
2.5083	2.0	11	2.4440
2.4676	2.9091	16	2.4218
2.4562	4.0	22	2.3870
2.377	4.9091	27	2.3475
2.3303	6.0	33	2.2793
2.2553	6.9091	38	2.2254
2.174	8.0	44	2.1392
2.131	8.9091	49	2.0661
2.0142	10.0	55	1.9626
1.8873	10.9091	60	1.8746
1.7633	12.0	66	1.7650
1.726	12.9091	71	1.6563
1.5711	14.0	77	1.5123
1.4344	14.9091	82	1.3950
1.3201	16.0	88	1.2661
1.1787	16.9091	93	1.1831
1.1444	18.0	99	1.1188
1.0591	18.9091	104	1.0836
1.0151	20.0	110	1.0540
1.0277	20.9091	115	1.0388
1.0025	22.0	121	1.0250
1.0161	22.9091	126	1.0154
0.9946	24.0	132	1.0047
0.9773	24.9091	137	0.9970
0.9708	26.0	143	0.9890
0.9374	26.9091	148	0.9822
0.9403	28.0	154	0.9751
0.94	28.9091	159	0.9703
0.902	30.0	165	0.9633
0.9215	30.9091	170	0.9604
0.8854	32.0	176	0.9548
0.96	32.9091	181	0.9503
0.9162	34.0	187	0.9453
0.8686	34.9091	192	0.9429
0.906	36.0	198	0.9385
0.8762	36.9091	203	0.9354
0.8929	38.0	209	0.9332
0.8687	38.9091	214	0.9301
0.8933	40.0	220	0.9279
0.858	40.9091	225	0.9241
0.8481	42.0	231	0.9223
0.8228	42.9091	236	0.9217
0.8593	44.0	242	0.9186
0.8238	44.9091	247	0.9156
0.8081	46.0	253	0.9161
0.8327	46.9091	258	0.9129
0.8029	48.0	264	0.9110
0.7909	48.9091	269	0.9094
0.7826	50.0	275	0.9079
0.773	50.9091	280	0.9122
0.7377	52.0	286	0.9078
0.7491	52.9091	291	0.9050
0.7414	54.0	297	0.9093
0.7275	54.9091	302	0.9053
0.7198	56.0	308	0.9046
0.7203	56.9091	313	0.9093
0.6903	58.0	319	0.9042
0.6987	58.9091	324	0.9107
0.7141	60.0	330	0.9079
0.7023	60.9091	335	0.9120
0.6945	62.0	341	0.9087
0.6897	62.9091	346	0.9130
0.6597	64.0	352	0.9134
0.6954	64.9091	357	0.9120

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.1.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

GaetanMichelet
/

Llama-31-8B_task-1_120-samples_config-4_full

Llama-31-8B_task-1_120-samples_config-4_full

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for GaetanMichelet/Llama-31-8B_task-1_120-samples_config-4_full

Collection including GaetanMichelet/Llama-31-8B_task-1_120-samples_config-4_full

Configurations choice

Evaluation results