zephyr-7b-dpo-full-ultrabin-low-curriculum

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5068
Rewards/chosen: -0.9758
Rewards/rejected: -1.9257
Rewards/accuracies: 0.7773
Rewards/margins: 0.9500
Logps/rejected: -455.2352
Logps/chosen: -360.2064
Logits/rejected: 1.7146
Logits/chosen: 0.8613

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 55
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6555	0.1046	50	0.6484	-0.0250	-0.1380	0.7031	0.1131	-276.4668	-265.1290	-2.5978	-2.6370
0.5737	0.2092	100	0.5621	-0.4814	-1.0531	0.7539	0.5717	-367.9698	-310.7707	-0.7512	-0.9088
0.5528	0.3138	150	0.5396	-0.6348	-1.3904	0.7539	0.7556	-401.7020	-326.1078	0.5012	-0.0540
0.5224	0.4184	200	0.5247	-0.6079	-1.3937	0.7695	0.7858	-402.0287	-323.4210	0.6114	-0.0575
0.5107	0.5230	250	0.5174	-0.8253	-1.6620	0.7617	0.8366	-428.8582	-345.1613	1.5464	0.8133
0.5088	0.6276	300	0.5116	-0.9508	-1.8003	0.7695	0.8495	-442.6956	-357.7103	1.4450	0.6887
0.5011	0.7322	350	0.5125	-1.2146	-2.1455	0.7812	0.9309	-477.2163	-384.0933	1.9702	1.1437
0.4881	0.8368	400	0.5087	-1.1026	-2.0491	0.7852	0.9465	-467.5729	-372.8942	1.9158	1.0685
0.4904	0.9414	450	0.5069	-0.9697	-1.9174	0.7773	0.9476	-454.4013	-359.6043	1.6890	0.8343

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

sfulay
/

zephyr-7b-dpo-full-ultrabin-low-curriculum

zephyr-7b-dpo-full-ultrabin-low-curriculum

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sfulay/zephyr-7b-dpo-full-ultrabin-low-curriculum

Dataset used to train sfulay/zephyr-7b-dpo-full-ultrabin-low-curriculum

Evaluation results