zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5443
Rewards/chosen: -0.4815
Rewards/rejected: -0.9488
Rewards/accuracies: 0.7228
Rewards/margins: 0.4673
Logps/rejected: -353.7514
Logps/chosen: -321.4202
Logits/rejected: -1.9090
Logits/chosen: -2.0033

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6906	0.03	100	0.6904	0.0079	0.0034	0.5998	0.0045	-258.5292	-272.4813	-2.5792	-2.6794
0.6832	0.05	200	0.6789	0.0377	0.0100	0.6310	0.0277	-257.8704	-269.5003	-2.5936	-2.6947
0.6593	0.08	300	0.6569	0.0387	-0.0302	0.6613	0.0689	-261.8914	-269.4003	-2.5863	-2.6882
0.6187	0.11	400	0.6299	-0.0829	-0.2296	0.6633	0.1467	-281.8324	-281.5615	-2.5720	-2.6736
0.6725	0.13	500	0.6312	0.0386	-0.1012	0.6724	0.1399	-268.9954	-269.4083	-2.5082	-2.6086
0.6094	0.16	600	0.6134	-0.0997	-0.2926	0.6724	0.1929	-288.1276	-283.2419	-2.4269	-2.5264
0.622	0.19	700	0.6138	-0.0662	-0.2635	0.6764	0.1974	-285.2251	-279.8869	-2.4243	-2.5237
0.6219	0.21	800	0.6040	-0.0706	-0.3182	0.6905	0.2476	-290.6908	-280.3326	-2.3385	-2.4372
0.623	0.24	900	0.5902	-0.3711	-0.7045	0.6855	0.3334	-329.3168	-310.3764	-2.3155	-2.4141
0.6097	0.27	1000	0.5900	-0.4292	-0.7963	0.6734	0.3670	-338.4981	-316.1939	-2.1784	-2.2752
0.6136	0.29	1100	0.5784	-0.1724	-0.5165	0.7046	0.3441	-310.5219	-290.5160	-2.1727	-2.2710
0.6567	0.32	1200	0.5606	-0.3899	-0.8591	0.7188	0.4691	-344.7808	-312.2660	-2.1098	-2.2080
0.643	0.35	1300	0.5635	-0.4687	-0.8505	0.7107	0.3818	-343.9194	-320.1420	-2.1156	-2.2131
0.5965	0.37	1400	0.5605	-0.5007	-0.9440	0.6925	0.4433	-353.2680	-323.3370	-2.0815	-2.1787
0.5845	0.4	1500	0.5532	-0.4826	-0.9062	0.7137	0.4236	-349.4915	-321.5331	-2.1051	-2.2018
0.5877	0.43	1600	0.5531	-0.4294	-0.8415	0.7248	0.4122	-343.0254	-316.2077	-2.0961	-2.1928
0.5718	0.45	1700	0.5541	-0.4126	-0.8317	0.7167	0.4192	-342.0433	-314.5273	-2.0529	-2.1491
0.5572	0.48	1800	0.5574	-0.4879	-0.8998	0.7177	0.4119	-348.8538	-322.0596	-2.0373	-2.1342
0.6168	0.51	1900	0.5588	-0.4436	-0.8496	0.7046	0.4060	-343.8322	-317.6336	-2.0368	-2.1330
0.5584	0.53	2000	0.5511	-0.4453	-0.8705	0.7188	0.4252	-345.9210	-317.8035	-2.0010	-2.0971
0.5863	0.56	2100	0.5504	-0.4714	-0.9080	0.7218	0.4365	-349.6664	-320.4117	-1.9625	-2.0577
0.5805	0.59	2200	0.5509	-0.4003	-0.8341	0.7218	0.4338	-342.2770	-313.2971	-1.9579	-2.0528
0.5853	0.61	2300	0.5429	-0.4611	-0.9355	0.7188	0.4743	-352.4162	-319.3822	-1.9440	-2.0390
0.5561	0.64	2400	0.5407	-0.5040	-0.9820	0.7208	0.4780	-357.0744	-323.6726	-1.9359	-2.0311
0.559	0.67	2500	0.5508	-0.4111	-0.8553	0.7208	0.4442	-344.4033	-314.3806	-1.9377	-2.0324
0.5803	0.69	2600	0.5507	-0.4118	-0.8515	0.7218	0.4397	-344.0258	-314.4533	-1.9263	-2.0208
0.5537	0.72	2700	0.5453	-0.4866	-0.9460	0.7188	0.4594	-353.4755	-321.9308	-1.9121	-2.0064
0.5562	0.75	2800	0.5441	-0.5000	-0.9628	0.7228	0.4628	-355.1509	-323.2741	-1.9088	-2.0030
0.5553	0.77	2900	0.5442	-0.4959	-0.9606	0.7188	0.4647	-354.9320	-322.8617	-1.9075	-2.0017
0.59	0.8	3000	0.5431	-0.5074	-0.9824	0.7188	0.4749	-357.1091	-324.0160	-1.9088	-2.0032
0.6168	0.83	3100	0.5439	-0.4857	-0.9579	0.7188	0.4722	-354.6618	-321.8407	-1.9073	-2.0016
0.5655	0.85	3200	0.5443	-0.4817	-0.9508	0.7198	0.4691	-353.9527	-321.4442	-1.9095	-2.0037
0.5201	0.88	3300	0.5441	-0.4835	-0.9518	0.7228	0.4683	-354.0496	-321.6215	-1.9082	-2.0024
0.5576	0.91	3400	0.5443	-0.4814	-0.9485	0.7248	0.4672	-353.7254	-321.4092	-1.9113	-2.0056
0.5929	0.93	3500	0.5443	-0.4814	-0.9481	0.7248	0.4667	-353.6854	-321.4121	-1.9086	-2.0029
0.507	0.96	3600	0.5443	-0.4815	-0.9488	0.7198	0.4673	-353.7491	-321.4216	-1.9086	-2.0028
0.5412	0.99	3700	0.5443	-0.4815	-0.9488	0.7228	0.4673	-353.7514	-321.4202	-1.9090	-2.0033

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2

LeeSB
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for LeeSB/zephyr-7b-dpo-qlora

Evaluation results