metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-qlora
results: []
zephyr-7b-dpo-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0136
- Rewards/chosen: -376.5464
- Rewards/rejected: -330.4243
- Rewards/accuracies: 0.4544
- Rewards/margins: -46.1221
- Logps/rejected: -33295.5859
- Logps/chosen: -37927.6367
- Neglected: 256.0
- Selected: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Neglected | Selected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6727 | 0.1 | 100 | 0.6631 | 0.0074 | -0.0332 | 0.7024 | 0.0405 | -256.4745 | -272.2623 | 256.0 | 0.0 |
0.0392 | 0.21 | 200 | 0.0276 | -119.9914 | -105.4188 | 0.4464 | -14.5726 | -10795.0420 | -12272.1426 | 256.0 | 0.0 |
0.0208 | 0.31 | 300 | 0.0199 | -281.3865 | -245.2151 | 0.4444 | -36.1714 | -24774.6660 | -28411.6465 | 256.0 | 0.0 |
0.0157 | 0.42 | 400 | 0.0161 | -353.7562 | -307.1862 | 0.4563 | -46.5699 | -30971.7832 | -35648.6172 | 256.0 | 0.0 |
0.0182 | 0.52 | 500 | 0.0148 | -331.5956 | -289.6645 | 0.4464 | -41.9311 | -29219.6113 | -33432.5625 | 256.0 | 0.0 |
0.013 | 0.63 | 600 | 0.0143 | -356.6841 | -312.4188 | 0.4544 | -44.2654 | -31495.0312 | -35941.4141 | 256.0 | 0.0 |
0.0165 | 0.73 | 700 | 0.0143 | -353.6940 | -310.5345 | 0.4504 | -43.1595 | -31306.6094 | -35642.4023 | 256.0 | 0.0 |
0.0145 | 0.84 | 800 | 0.0135 | -374.0797 | -328.2772 | 0.4544 | -45.8026 | -33080.8789 | -37680.9766 | 256.0 | 0.0 |
0.0195 | 0.94 | 900 | 0.0137 | -376.5184 | -330.4032 | 0.4544 | -46.1152 | -33293.4727 | -37924.8398 | 256.0 | 0.0 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1