metadata

license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Neglected
0.6727	0.1	100	0.6631	0.0074	-0.0332	0.7024	0.0405	-256.4745	-272.2623	256.0
0.0392	0.21	200	0.0276	-119.9914	-105.4188	0.4464	-14.5726	-10795.0420	-12272.1426	256.0
0.0208	0.31	300	0.0199	-281.3865	-245.2151	0.4444	-36.1714	-24774.6660	-28411.6465	256.0
0.0157	0.42	400	0.0161	-353.7562	-307.1862	0.4563	-46.5699	-30971.7832	-35648.6172	256.0
0.0182	0.52	500	0.0148	-331.5956	-289.6645	0.4464	-41.9311	-29219.6113	-33432.5625	256.0
0.013	0.63	600	0.0143	-356.6841	-312.4188	0.4544	-44.2654	-31495.0312	-35941.4141	256.0
0.0165	0.73	700	0.0143	-353.6940	-310.5345	0.4504	-43.1595	-31306.6094	-35642.4023	256.0
0.0145	0.84	800	0.0135	-374.0797	-328.2772	0.4544	-45.8026	-33080.8789	-37680.9766	256.0
0.0195	0.94	900	0.0137	-376.5184	-330.4032	0.4544	-46.1152	-33293.4727	-37924.8398	256.0