OpenELM-1_1B-DPO-full-most-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.2342
Rewards/chosen: -6.8438
Rewards/rejected: -7.25
Rewards/accuracies: 0.5410
Rewards/margins: 0.4062
Logps/rejected: -1016.0
Logps/chosen: -1004.0
Logits/rejected: -4.9688
Logits/chosen: -6.25

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6243	0.1047	100	0.6808	-0.4668	-0.5312	0.5488	0.0625	-342.0	-364.0	-12.1875	-12.4375
0.6169	0.2093	200	0.6945	-0.5977	-0.6367	0.5059	0.0393	-352.0	-378.0	-11.1875	-11.5625
0.6093	0.3140	300	0.7121	-0.7305	-0.7461	0.4941	0.0176	-364.0	-392.0	-10.5	-11.0
0.6477	0.4186	400	0.6795	-1.7422	-1.8516	0.5488	0.1094	-474.0	-492.0	-13.1875	-13.5625
0.6297	0.5233	500	0.7219	-1.6719	-1.8281	0.5430	0.1602	-472.0	-486.0	-11.875	-12.3125
0.6442	0.6279	600	0.7078	-1.125	-1.1484	0.5039	0.0232	-404.0	-432.0	-9.3125	-9.75
0.6076	0.7326	700	0.7151	-1.1016	-1.1406	0.5312	0.0425	-404.0	-428.0	-10.25	-10.625
0.6221	0.8373	800	0.7139	-1.2188	-1.2891	0.5527	0.0684	-418.0	-440.0	-12.0	-12.25
0.6163	0.9419	900	0.6779	-1.7031	-1.8984	0.5703	0.1914	-478.0	-488.0	-9.375	-10.0
0.1598	1.0466	1000	0.8233	-3.125	-3.3438	0.5645	0.2344	-624.0	-632.0	-11.3125	-12.3125
0.176	1.1512	1100	0.9250	-2.9688	-3.0781	0.5176	0.1177	-596.0	-616.0	-11.125	-12.0625
0.1652	1.2559	1200	0.8757	-3.75	-3.9688	0.5449	0.2305	-684.0	-692.0	-8.4375	-9.5625
0.1472	1.3605	1300	0.8840	-3.4219	-3.7188	0.5488	0.2910	-660.0	-660.0	-9.5625	-10.625
0.1283	1.4652	1400	0.9069	-3.9688	-4.2812	0.5449	0.3125	-716.0	-716.0	-8.625	-9.8125
0.1421	1.5699	1500	0.8969	-3.625	-3.9062	0.5488	0.2832	-680.0	-680.0	-9.5	-10.4375
0.1378	1.6745	1600	0.9229	-4.4062	-4.75	0.5391	0.3320	-764.0	-760.0	-8.5625	-9.5
0.1541	1.7792	1700	0.8930	-3.9375	-4.2812	0.5371	0.3379	-716.0	-712.0	-8.8125	-9.8125
0.1227	1.8838	1800	0.9257	-4.1875	-4.5312	0.5410	0.3457	-744.0	-736.0	-7.3438	-8.5
0.1246	1.9885	1900	0.8994	-4.0938	-4.375	0.5371	0.3086	-728.0	-728.0	-8.1875	-9.1875
0.0233	2.0931	2000	1.0906	-5.875	-6.2188	0.5352	0.3262	-912.0	-908.0	-6.2188	-7.4062
0.0197	2.1978	2100	1.2212	-6.6875	-7.0625	0.5391	0.3574	-996.0	-988.0	-5.9375	-7.125
0.0158	2.3025	2200	1.1803	-6.4062	-6.8125	0.5430	0.4102	-968.0	-960.0	-5.75	-6.9688
0.016	2.4071	2300	1.2299	-6.75	-7.1562	0.5410	0.3906	-1004.0	-996.0	-5.3125	-6.5625
0.0135	2.5118	2400	1.2395	-7.0312	-7.4062	0.5391	0.3906	-1032.0	-1020.0	-4.5625	-5.8438
0.0175	2.6164	2500	1.2537	-6.8125	-7.1875	0.5352	0.3867	-1008.0	-1000.0	-4.9062	-6.1875
0.0144	2.7211	2600	1.2442	-6.9375	-7.3125	0.5371	0.4043	-1020.0	-1012.0	-4.8125	-6.0938
0.0107	2.8257	2700	1.2346	-6.875	-7.2812	0.5352	0.4023	-1016.0	-1004.0	-4.9062	-6.1875
0.0206	2.9304	2800	1.2342	-6.8438	-7.25	0.5410	0.4062	-1016.0	-1004.0	-4.9688	-6.25

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-most-similar

OpenELM-1_1B-DPO-full-most-similar

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results