llama3.1-cpo_j-full-0919

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

Loss: 2.0330
Rewards/chosen: -17.2716
Rewards/rejected: -17.4010
Rewards/accuracies: 0.5283
Rewards/margins: 0.1295
Logps/rejected: -174.0103
Logps/chosen: -172.7156
Logits/rejected: -0.7823
Logits/chosen: -0.8013
Nll Loss: 0.4931

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss
No log	0.0230	1	2.8864	-26.5532	-26.5849	0.5174	0.0317	-265.8489	-265.5324	-0.2622	-0.2859	1.1891
No log	0.0460	2	2.8852	-26.5433	-26.5669	0.5217	0.0236	-265.6693	-265.4330	-0.2636	-0.2872	1.1879
No log	0.0690	3	2.8825	-26.4975	-26.5200	0.5130	0.0225	-265.1996	-264.9747	-0.2640	-0.2874	1.1867
No log	0.0920	4	2.8709	-26.3594	-26.3890	0.5261	0.0296	-263.8900	-263.5942	-0.2712	-0.2946	1.1816
No log	0.1149	5	2.8528	-26.2159	-26.2470	0.5261	0.0311	-262.4702	-262.1591	-0.2743	-0.2975	1.1581
No log	0.1379	6	2.8240	-25.8178	-25.8572	0.5239	0.0394	-258.5723	-258.1781	-0.2926	-0.3156	1.1426
No log	0.1609	7	2.7885	-25.5238	-25.5623	0.5217	0.0385	-255.6230	-255.2376	-0.3115	-0.3343	1.1203
No log	0.1839	8	2.7394	-24.9173	-24.9665	0.5348	0.0491	-249.6646	-249.1732	-0.3523	-0.3759	1.0932
No log	0.2069	9	2.6997	-24.5346	-24.5855	0.5304	0.0509	-245.8549	-245.3461	-0.3716	-0.3958	1.0735
3.0132	0.2299	10	2.6736	-24.2278	-24.2821	0.5348	0.0543	-242.8206	-242.2775	-0.3888	-0.4137	1.0604
3.0132	0.2529	11	2.6514	-23.9833	-24.0509	0.5348	0.0676	-240.5090	-239.8333	-0.4023	-0.4275	1.0473
3.0132	0.2759	12	2.6159	-23.5112	-23.5825	0.5348	0.0712	-235.8245	-235.1123	-0.4367	-0.4619	1.0318
3.0132	0.2989	13	2.5754	-22.9830	-23.0617	0.5304	0.0787	-230.6171	-229.8301	-0.4755	-0.5012	0.9972
3.0132	0.3218	14	2.5425	-22.4882	-22.5719	0.5261	0.0837	-225.7186	-224.8815	-0.5193	-0.5447	0.9638
3.0132	0.3448	15	2.5079	-22.0390	-22.1303	0.5348	0.0913	-221.3027	-220.3899	-0.5581	-0.5829	0.9364
3.0132	0.3678	16	2.4829	-21.6296	-21.7212	0.5348	0.0917	-217.2125	-216.2958	-0.5988	-0.6233	0.9076
3.0132	0.3908	17	2.4569	-21.2310	-21.3319	0.5370	0.1010	-213.3194	-212.3095	-0.6344	-0.6586	0.8787
3.0132	0.4138	18	2.4349	-20.8876	-20.9881	0.5304	0.1005	-209.8813	-208.8761	-0.6694	-0.6929	0.8582
3.0132	0.4368	19	2.4174	-20.5639	-20.6694	0.5304	0.1055	-206.6935	-205.6389	-0.7053	-0.7288	0.8530
2.6371	0.4598	20	2.3954	-20.2042	-20.3148	0.5326	0.1105	-203.1477	-202.0424	-0.7411	-0.7640	0.8311
2.6371	0.4828	21	2.3761	-19.8765	-19.9851	0.5239	0.1086	-199.8509	-198.7649	-0.7733	-0.7956	0.8173
2.6371	0.5057	22	2.3550	-19.4793	-19.5897	0.5217	0.1104	-195.8972	-194.7931	-0.7947	-0.8166	0.7966
2.6371	0.5287	23	2.3190	-19.0685	-19.1814	0.5239	0.1129	-191.8135	-190.6849	-0.8095	-0.8311	0.7629
2.6371	0.5517	24	2.2593	-18.7171	-18.8248	0.5283	0.1077	-188.2483	-187.1710	-0.8197	-0.8403	0.7104
2.6371	0.5747	25	2.1824	-18.4482	-18.5591	0.5326	0.1109	-185.5911	-184.4822	-0.8234	-0.8436	0.6203
2.6371	0.5977	26	2.1139	-18.2639	-18.3699	0.5326	0.1059	-183.6985	-182.6391	-0.8213	-0.8404	0.5467
2.6371	0.6207	27	2.0862	-18.1268	-18.2372	0.5326	0.1104	-182.3718	-181.2681	-0.8158	-0.8341	0.5235
2.6371	0.6437	28	2.0741	-18.0305	-18.1407	0.5283	0.1103	-181.4072	-180.3046	-0.8051	-0.8225	0.5133
2.6371	0.6667	29	2.0690	-17.9415	-18.0517	0.5304	0.1101	-180.5167	-179.4155	-0.7987	-0.8158	0.5092
2.3737	0.6897	30	2.0669	-17.8450	-17.9553	0.5304	0.1103	-179.5531	-178.4496	-0.7900	-0.8066	0.5082
2.3737	0.7126	31	2.0595	-17.7753	-17.8928	0.5370	0.1175	-178.9280	-177.7533	-0.7924	-0.8090	0.5009
2.3737	0.7356	32	2.0559	-17.6972	-17.8134	0.5326	0.1162	-178.1344	-176.9719	-0.7821	-0.7989	0.5023
2.3737	0.7586	33	2.0530	-17.6212	-17.7447	0.5283	0.1235	-177.4470	-176.2120	-0.7772	-0.7941	0.4995
2.3737	0.7816	34	2.0495	-17.5594	-17.6781	0.5239	0.1187	-176.7806	-175.5940	-0.7770	-0.7941	0.4961
2.3737	0.8046	35	2.0463	-17.5069	-17.6289	0.5239	0.1220	-176.2891	-175.0691	-0.7765	-0.7938	0.4933
2.3737	0.8276	36	2.0454	-17.4648	-17.5832	0.5283	0.1184	-175.8317	-174.6475	-0.7759	-0.7937	0.4930
2.3737	0.8506	37	2.0385	-17.4124	-17.5404	0.5239	0.1280	-175.4043	-174.1244	-0.7766	-0.7948	0.4914
2.3737	0.8736	38	2.0369	-17.3727	-17.4968	0.5174	0.1241	-174.9679	-173.7269	-0.7789	-0.7972	0.4935
2.3737	0.8966	39	2.0370	-17.3371	-17.4632	0.5239	0.1262	-174.6325	-173.3709	-0.7812	-0.7995	0.4908
2.078	0.9195	40	2.0331	-17.3114	-17.4457	0.5261	0.1343	-174.4572	-173.1142	-0.7830	-0.8020	0.4896
2.078	0.9425	41	2.0353	-17.2892	-17.4183	0.5283	0.1291	-174.1829	-172.8922	-0.7830	-0.8019	0.4943
2.078	0.9655	42	2.0323	-17.2779	-17.4112	0.5348	0.1333	-174.1118	-172.7786	-0.7816	-0.8008	0.4935
2.078	0.9885	43	2.0330	-17.2716	-17.4010	0.5283	0.1295	-174.0103	-172.7156	-0.7823	-0.8013	0.4931

Framework versions

Transformers 4.44.2
Pytorch 2.3.1
Datasets 2.21.0
Tokenizers 0.19.1

jbjeong91
/

llama3.1-cpo_j-full-0919

llama3.1-cpo_j-full-0919

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jbjeong91/llama3.1-cpo_j-full-0919

Dataset used to train jbjeong91/llama3.1-cpo_j-full-0919

Evaluation results