metadata

library_name: transformers
tags:
  - trl
  - cpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-CPO
    results: []

OpenELM-1_1B-CPO

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Logits/chosen: -8.875
Logits/rejected: -7.5312
Logps/chosen: -364.0
Logps/rejected: -444.0
Loss: 2.1904
Nll Loss: 1.1719
Rewards/accuracies: 0.5918
Rewards/chosen: -3.6406
Rewards/margins: 0.8008
Rewards/rejected: -4.4375

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Nll Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
2.4271	0.1047	100	-12.3125	-12.125	-336.0	-328.0	2.2959	1.0859	0.4980	-3.3594	-0.0850	-3.2812
2.2538	0.2093	200	-9.875	-9.5	-338.0	-346.0	2.1836	1.0938	0.5234	-3.3906	0.0640	-3.4531
2.1253	0.3140	300	-11.4375	-11.0	-346.0	-360.0	2.1307	1.1172	0.5176	-3.4531	0.1416	-3.5938
2.0609	0.4186	400	-11.125	-10.625	-332.0	-344.0	2.1359	1.0703	0.5293	-3.3281	0.1187	-3.4375
2.1905	0.5233	500	-9.3125	-8.5	-338.0	-352.0	2.1286	1.0859	0.5254	-3.375	0.1357	-3.5156
2.1304	0.6279	600	-10.625	-9.625	-360.0	-398.0	2.1410	1.1562	0.5723	-3.6094	0.3672	-3.9688
2.2554	0.7326	700	-9.6875	-8.5625	-374.0	-416.0	2.1848	1.2031	0.5664	-3.7344	0.4258	-4.1562
2.0796	0.8373	800	-7.8438	-7.0312	-346.0	-374.0	2.1224	1.1172	0.5469	-3.4531	0.2852	-3.75
2.1021	0.9419	900	-6.2812	-5.2812	-350.0	-390.0	2.1099	1.1328	0.5723	-3.5	0.4062	-3.9062
1.5182	1.0471	1000	-10.625	-9.375	-350.0	-386.0	2.1662	1.125	0.5664	-3.5	0.3633	-3.8594
1.4917	1.1518	1100	-7.875	-6.4688	-356.0	-400.0	2.1588	1.1484	0.5703	-3.5625	0.4395	-4.0
1.5219	1.2564	1200	-7.7812	-6.6562	-364.0	-420.0	2.1449	1.1719	0.5938	-3.625	0.5586	-4.1875
1.5292	1.3611	1300	-8.875	-7.75	-354.0	-402.0	2.1489	1.1406	0.5742	-3.5312	0.4785	-4.0
1.4257	1.4657	1400	-9.25	-7.7188	-358.0	-410.0	2.1193	1.1562	0.5801	-3.5781	0.5156	-4.0938
1.4366	1.5704	1500	-8.9375	-7.6875	-358.0	-416.0	2.0983	1.1562	0.5898	-3.5938	0.5586	-4.1562
1.5246	1.6750	1600	-6.9062	-5.4688	-358.0	-420.0	2.1191	1.1562	0.5938	-3.5781	0.625	-4.2188
1.4534	1.7797	1700	-10.0625	-9.0625	-348.0	-404.0	2.0829	1.1172	0.5762	-3.4688	0.5625	-4.0312
1.4551	1.8844	1800	-8.1875	-6.8438	-356.0	-416.0	2.1033	1.1484	0.5898	-3.5625	0.6016	-4.1562
1.4969	1.9890	1900	-9.3125	-8.125	-354.0	-412.0	2.1046	1.1406	0.5762	-3.5312	0.5938	-4.125
0.9984	2.0937	2000	-9.1875	-7.9375	-364.0	-428.0	2.1806	1.1719	0.5781	-3.6406	0.6367	-4.2812
0.9885	2.1983	2100	-8.6875	-7.4062	-370.0	-448.0	2.1927	1.1875	0.5801	-3.6875	0.7930	-4.5
0.9814	2.3030	2200	-8.8125	-7.5	-362.0	-436.0	2.1867	1.1719	0.5742	-3.625	0.7266	-4.3438
0.9844	2.4076	2300	-8.375	-7.125	-368.0	-452.0	2.1905	1.1875	0.5996	-3.6875	0.8438	-4.5312
0.9931	2.5123	2400	-8.6875	-7.375	-364.0	-442.0	2.1843	1.1719	0.5820	-3.6406	0.7930	-4.4375
0.9537	2.6170	2500	-8.8125	-7.5	-364.0	-446.0	2.1907	1.1719	0.5898	-3.6406	0.8125	-4.4688
0.9512	2.7216	2600	-8.8125	-7.5	-364.0	-446.0	2.1918	1.1719	0.5898	-3.6406	0.8086	-4.4375
0.9604	2.8263	2700	-8.875	-7.5312	-364.0	-442.0	2.1906	1.1719	0.5879	-3.6406	0.7969	-4.4375
1.0208	2.9309	2800	-8.875	-7.5312	-364.0	-444.0	2.1904	1.1719	0.5918	-3.6406	0.8008	-4.4375

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1