metadata

library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-least-similar
    results: []

OpenELM-1_1B-DPO-full-least-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0609
Rewards/chosen: -3.7969
Rewards/rejected: -4.0
Rewards/accuracies: 0.5
Rewards/margins: 0.2148
Logps/rejected: -692.0
Logps/chosen: -700.0
Logits/rejected: -12.9375
Logits/chosen: -13.25

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.187	0.1047	100	0.6760	-0.4492	-0.5586	0.5469	0.1084	-344.0	-364.0	-14.5625	-14.6875
0.1162	0.2094	200	0.6879	-0.7734	-0.875	0.5410	0.1016	-376.0	-396.0	-11.6875	-12.0625
0.1436	0.3141	300	0.7670	-1.6562	-1.7969	0.4941	0.1377	-468.0	-484.0	-13.875	-14.0625
0.1461	0.4188	400	0.7442	-1.0469	-1.0625	0.5039	0.0201	-394.0	-422.0	-16.5	-16.625
0.1352	0.5236	500	0.8131	-1.6406	-1.7031	0.5020	0.0630	-460.0	-482.0	-15.5625	-15.6875
0.1507	0.6283	600	0.8542	-1.625	-1.6328	0.4766	0.0096	-452.0	-482.0	-17.25	-17.375
0.1278	0.7330	700	0.8274	-1.7891	-1.9453	0.4980	0.1592	-484.0	-496.0	-14.8125	-15.0
0.1303	0.8377	800	0.8349	-1.7734	-1.7969	0.5195	0.0272	-468.0	-496.0	-16.5	-16.5
0.1614	0.9424	900	0.8078	-2.2969	-2.5	0.5332	0.1992	-540.0	-548.0	-16.375	-16.375
0.0199	1.0471	1000	0.8233	-2.2656	-2.3906	0.4863	0.1279	-528.0	-544.0	-15.4375	-15.875
0.0348	1.1518	1100	0.8452	-2.0469	-2.1562	0.5039	0.1187	-504.0	-524.0	-17.0	-17.125
0.0186	1.2565	1200	0.8788	-2.9219	-3.0312	0.5098	0.1074	-592.0	-612.0	-14.75	-15.0625
0.0277	1.3613	1300	0.8304	-2.7969	-2.8906	0.5137	0.0928	-576.0	-600.0	-14.25	-14.5
0.0212	1.4660	1400	0.8990	-2.7969	-2.9062	0.5	0.1099	-580.0	-600.0	-14.25	-14.4375
0.0333	1.5707	1500	0.9111	-3.2031	-3.2969	0.5215	0.0981	-620.0	-640.0	-12.1875	-12.625
0.0163	1.6754	1600	0.9215	-3.2188	-3.3281	0.4941	0.1104	-620.0	-640.0	-11.0625	-11.5
0.0309	1.7801	1700	0.9203	-2.6719	-2.7344	0.5059	0.0635	-560.0	-584.0	-13.5625	-13.8125
0.0228	1.8848	1800	0.9032	-2.8594	-2.9531	0.4941	0.0972	-584.0	-604.0	-13.3125	-13.5625
0.0116	1.9895	1900	0.9123	-3.0156	-3.125	0.5	0.1187	-600.0	-620.0	-13.375	-13.625
0.0011	2.0942	2000	0.9715	-3.2656	-3.4531	0.4980	0.1865	-636.0	-644.0	-13.0625	-13.3125
0.0019	2.1990	2100	1.0378	-3.6719	-3.9062	0.5098	0.2393	-680.0	-684.0	-12.5	-12.8125
0.0011	2.3037	2200	1.0456	-3.7188	-3.9375	0.5020	0.2227	-684.0	-692.0	-12.8125	-13.125
0.0009	2.4084	2300	1.0567	-3.75	-3.9688	0.5020	0.2217	-684.0	-692.0	-12.9375	-13.25
0.0022	2.5131	2400	1.0450	-3.7188	-3.9062	0.4961	0.1953	-680.0	-692.0	-13.0	-13.3125
0.0013	2.6178	2500	1.0499	-3.7656	-3.9688	0.5020	0.2080	-684.0	-696.0	-12.9375	-13.25
0.0006	2.7225	2600	1.0572	-3.7812	-3.9844	0.4961	0.2100	-688.0	-696.0	-12.9375	-13.25
0.0007	2.8272	2700	1.0600	-3.7969	-4.0	0.5020	0.2168	-692.0	-700.0	-12.9375	-13.25
0.0012	2.9319	2800	1.0609	-3.7969	-4.0	0.5	0.2148	-692.0	-700.0	-12.9375	-13.25

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1