zephyr-7b-dpo-full-ultrabin-low-bleu-3-epochs
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5378
- Rewards/chosen: -1.1280
- Rewards/rejected: -2.2088
- Rewards/accuracies: 0.75
- Rewards/margins: 1.0808
- Logps/rejected: -483.5383
- Logps/chosen: -375.4253
- Logits/rejected: 0.4263
- Logits/chosen: -0.3049
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6148 | 0.3484 | 50 | 0.6349 | -0.0077 | -0.1918 | 0.7344 | 0.1841 | -281.8412 | -263.4028 | -2.4711 | -2.5174 |
0.48 | 0.6969 | 100 | 0.5650 | -0.3580 | -0.9260 | 0.7383 | 0.5681 | -355.2653 | -298.4250 | -0.5952 | -0.8597 |
0.3945 | 1.0453 | 150 | 0.5507 | -0.4890 | -1.2040 | 0.7812 | 0.7150 | -383.0621 | -311.5342 | 0.6723 | 0.1059 |
0.3213 | 1.3937 | 200 | 0.5293 | -0.7390 | -1.5264 | 0.7617 | 0.7874 | -415.3053 | -336.5349 | 0.7764 | 0.2111 |
0.3246 | 1.7422 | 250 | 0.5303 | -0.7632 | -1.6866 | 0.7695 | 0.9235 | -431.3257 | -338.9464 | 0.1592 | -0.5392 |
0.1986 | 2.0906 | 300 | 0.5372 | -0.9618 | -1.9577 | 0.7578 | 0.9959 | -458.4319 | -358.8135 | 0.6117 | -0.1931 |
0.1848 | 2.4390 | 350 | 0.5348 | -1.0734 | -2.1515 | 0.7578 | 1.0781 | -477.8110 | -369.9702 | 0.4674 | -0.2725 |
0.1901 | 2.7875 | 400 | 0.5385 | -1.1315 | -2.2179 | 0.7461 | 1.0864 | -484.4519 | -375.7825 | 0.4592 | -0.2833 |
Framework versions
- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for sfulay/zephyr-7b-dpo-full-ultrabin-low-bleu-3-epochs
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full