zephyr-7b-dpo-full-ultrabin-low-curriculum
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5068
- Rewards/chosen: -0.9758
- Rewards/rejected: -1.9257
- Rewards/accuracies: 0.7773
- Rewards/margins: 0.9500
- Logps/rejected: -455.2352
- Logps/chosen: -360.2064
- Logits/rejected: 1.7146
- Logits/chosen: 0.8613
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6555 | 0.1046 | 50 | 0.6484 | -0.0250 | -0.1380 | 0.7031 | 0.1131 | -276.4668 | -265.1290 | -2.5978 | -2.6370 |
0.5737 | 0.2092 | 100 | 0.5621 | -0.4814 | -1.0531 | 0.7539 | 0.5717 | -367.9698 | -310.7707 | -0.7512 | -0.9088 |
0.5528 | 0.3138 | 150 | 0.5396 | -0.6348 | -1.3904 | 0.7539 | 0.7556 | -401.7020 | -326.1078 | 0.5012 | -0.0540 |
0.5224 | 0.4184 | 200 | 0.5247 | -0.6079 | -1.3937 | 0.7695 | 0.7858 | -402.0287 | -323.4210 | 0.6114 | -0.0575 |
0.5107 | 0.5230 | 250 | 0.5174 | -0.8253 | -1.6620 | 0.7617 | 0.8366 | -428.8582 | -345.1613 | 1.5464 | 0.8133 |
0.5088 | 0.6276 | 300 | 0.5116 | -0.9508 | -1.8003 | 0.7695 | 0.8495 | -442.6956 | -357.7103 | 1.4450 | 0.6887 |
0.5011 | 0.7322 | 350 | 0.5125 | -1.2146 | -2.1455 | 0.7812 | 0.9309 | -477.2163 | -384.0933 | 1.9702 | 1.1437 |
0.4881 | 0.8368 | 400 | 0.5087 | -1.1026 | -2.0491 | 0.7852 | 0.9465 | -467.5729 | -372.8942 | 1.9158 | 1.0685 |
0.4904 | 0.9414 | 450 | 0.5069 | -0.9697 | -1.9174 | 0.7773 | 0.9476 | -454.4013 | -359.6043 | 1.6890 | 0.8343 |
Framework versions
- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for sfulay/zephyr-7b-dpo-full-ultrabin-low-curriculum
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full