llama3_8b_instruct_dpo_bwgenerator_v2
This model is a fine-tuned version of NanQiangHF/llama3_8b_instruct_bwgenerator on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3494
- Rewards/chosen: -0.5156
- Rewards/rejected: -2.0278
- Rewards/accuracies: 0.8713
- Rewards/margins: 1.5122
- Logps/rejected: -88.0817
- Logps/chosen: -43.7343
- Logits/rejected: 0.7079
- Logits/chosen: 0.1945
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5461 | 0.0719 | 1000 | 0.4574 | -0.0823 | -0.9594 | 0.8261 | 0.8771 | -77.3979 | -39.4010 | 0.6931 | 0.1837 |
0.426 | 0.1438 | 2000 | 0.3856 | -0.3308 | -1.6338 | 0.8454 | 1.3030 | -84.1417 | -41.8860 | 0.7041 | 0.1914 |
0.3758 | 0.2157 | 3000 | 0.3593 | -0.4540 | -1.9108 | 0.8652 | 1.4567 | -86.9117 | -43.1185 | 0.7065 | 0.1933 |
0.3611 | 0.2876 | 4000 | 0.3515 | -0.5039 | -2.0063 | 0.8687 | 1.5024 | -87.8675 | -43.6177 | 0.7088 | 0.1952 |
0.3438 | 0.3595 | 5000 | 0.3502 | -0.5107 | -2.0200 | 0.8681 | 1.5093 | -88.0041 | -43.6858 | 0.7085 | 0.1951 |
0.357 | 0.4313 | 6000 | 0.3487 | -0.5159 | -2.0325 | 0.8668 | 1.5166 | -88.1288 | -43.7373 | 0.7092 | 0.1955 |
0.3562 | 0.5032 | 7000 | 0.3496 | -0.5151 | -2.0278 | 0.8707 | 1.5127 | -88.0820 | -43.7290 | 0.7093 | 0.1956 |
0.3597 | 0.5751 | 8000 | 0.3493 | -0.5179 | -2.0304 | 0.8707 | 1.5125 | -88.1081 | -43.7570 | 0.7092 | 0.1956 |
0.3437 | 0.6470 | 9000 | 0.3492 | -0.5132 | -2.0264 | 0.8691 | 1.5132 | -88.0680 | -43.7105 | 0.7109 | 0.1971 |
0.3544 | 0.7189 | 10000 | 0.3488 | -0.5160 | -2.0301 | 0.8704 | 1.5142 | -88.1054 | -43.7379 | 0.7089 | 0.1953 |
0.3451 | 0.7908 | 11000 | 0.3498 | -0.5116 | -2.0235 | 0.8694 | 1.5119 | -88.0395 | -43.6945 | 0.7089 | 0.1951 |
0.3543 | 0.8627 | 12000 | 0.3485 | -0.5155 | -2.0306 | 0.8687 | 1.5151 | -88.1099 | -43.7334 | 0.7091 | 0.1955 |
0.3609 | 0.9346 | 13000 | 0.3494 | -0.5156 | -2.0278 | 0.8713 | 1.5122 | -88.0817 | -43.7343 | 0.7079 | 0.1945 |
Framework versions
- PEFT 0.10.0
- Transformers 4.44.0
- Pytorch 2.3.0+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for NanQiangHF/llama3_8b_instruct_dpo_bwgenerator_v2
Base model
NanQiangHF/llama3_8b_instruct_bwgenerator