LeeSB commited on
Commit
32f8791
1 Parent(s): 4066017

Model save

Browse files
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
- base_model: mistralai/Mistral-7B-v0.1
9
  model-index:
10
  - name: zephyr-7b-dpo-qlora
11
  results: []
@@ -16,17 +16,17 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # zephyr-7b-dpo-qlora
18
 
19
- This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.4889
22
- - Rewards/chosen: -2.5570
23
- - Rewards/rejected: -3.6602
24
- - Rewards/accuracies: 0.7420
25
- - Rewards/margins: 1.1032
26
- - Logps/rejected: -610.6146
27
- - Logps/chosen: -520.3311
28
- - Logits/rejected: -1.1934
29
- - Logits/chosen: -1.3156
30
 
31
  ## Model description
32
 
@@ -61,44 +61,44 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
- | 0.6822 | 0.03 | 100 | 0.6822 | 0.0500 | 0.0271 | 0.6620 | 0.0229 | -241.8811 | -259.6322 | -1.9601 | -2.0996 |
65
- | 0.6491 | 0.05 | 200 | 0.6486 | -0.0620 | -0.1691 | 0.6800 | 0.1071 | -261.5014 | -270.8282 | -1.9371 | -2.0738 |
66
- | 0.6171 | 0.08 | 300 | 0.6234 | -0.3309 | -0.5345 | 0.6725 | 0.2036 | -298.0363 | -297.7160 | -1.8915 | -2.0253 |
67
- | 0.6176 | 0.1 | 400 | 0.5988 | -0.7494 | -1.0723 | 0.6810 | 0.3229 | -351.8179 | -339.5648 | -1.8181 | -1.9461 |
68
- | 0.5761 | 0.13 | 500 | 0.5704 | -1.2031 | -1.7242 | 0.6925 | 0.5211 | -417.0116 | -384.9406 | -1.6429 | -1.7682 |
69
- | 0.5583 | 0.16 | 600 | 0.5622 | -0.8290 | -1.3172 | 0.7055 | 0.4881 | -376.3094 | -347.5316 | -1.6199 | -1.7496 |
70
- | 0.5297 | 0.18 | 700 | 0.5517 | -1.1832 | -1.8211 | 0.7115 | 0.6379 | -426.7012 | -382.9457 | -1.5803 | -1.7075 |
71
- | 0.5161 | 0.21 | 800 | 0.5413 | -1.6079 | -2.2528 | 0.7135 | 0.6449 | -469.8732 | -425.4197 | -1.5290 | -1.6555 |
72
- | 0.5089 | 0.24 | 900 | 0.5513 | -1.1205 | -1.7563 | 0.7120 | 0.6357 | -420.2160 | -376.6812 | -1.3977 | -1.5150 |
73
- | 0.5577 | 0.26 | 1000 | 0.5359 | -1.4373 | -2.1710 | 0.7200 | 0.7337 | -461.6903 | -408.3596 | -1.3512 | -1.4712 |
74
- | 0.5701 | 0.29 | 1100 | 0.5276 | -1.0004 | -1.6579 | 0.7315 | 0.6575 | -410.3777 | -364.6696 | -1.3514 | -1.4728 |
75
- | 0.5581 | 0.31 | 1200 | 0.5236 | -1.1076 | -1.8024 | 0.7300 | 0.6948 | -424.8326 | -375.3857 | -1.3054 | -1.4257 |
76
- | 0.5446 | 0.34 | 1300 | 0.5300 | -2.2586 | -3.2287 | 0.7235 | 0.9701 | -567.4619 | -490.4902 | -0.9841 | -1.0957 |
77
- | 0.5288 | 0.37 | 1400 | 0.5134 | -1.6785 | -2.5620 | 0.7350 | 0.8835 | -500.7915 | -432.4789 | -1.1189 | -1.2369 |
78
- | 0.4638 | 0.39 | 1500 | 0.5280 | -2.2152 | -3.3623 | 0.7325 | 1.1471 | -580.8159 | -486.1478 | -1.0978 | -1.2233 |
79
- | 0.5653 | 0.42 | 1600 | 0.5065 | -1.8073 | -2.7765 | 0.7360 | 0.9692 | -522.2392 | -445.3528 | -1.2289 | -1.3479 |
80
- | 0.5129 | 0.44 | 1700 | 0.5115 | -2.6322 | -3.6578 | 0.7290 | 1.0256 | -610.3751 | -527.8514 | -1.1655 | -1.2898 |
81
- | 0.464 | 0.47 | 1800 | 0.5067 | -2.5458 | -3.6660 | 0.7360 | 1.1202 | -611.1868 | -519.2065 | -1.1092 | -1.2262 |
82
- | 0.4435 | 0.5 | 1900 | 0.5028 | -2.4198 | -3.5101 | 0.7295 | 1.0903 | -595.5961 | -506.6063 | -1.1773 | -1.2970 |
83
- | 0.4722 | 0.52 | 2000 | 0.5024 | -2.8634 | -3.9045 | 0.7370 | 1.0411 | -635.0359 | -550.9646 | -1.1621 | -1.2850 |
84
- | 0.4946 | 0.55 | 2100 | 0.4990 | -2.5939 | -3.6584 | 0.7405 | 1.0645 | -610.4345 | -524.0187 | -1.2223 | -1.3435 |
85
- | 0.4809 | 0.58 | 2200 | 0.4960 | -1.9937 | -2.9287 | 0.7400 | 0.9350 | -537.4633 | -464.0007 | -1.2750 | -1.3983 |
86
- | 0.4721 | 0.6 | 2300 | 0.4994 | -2.7426 | -3.9056 | 0.7410 | 1.1630 | -635.1489 | -538.8865 | -1.1593 | -1.2804 |
87
- | 0.4693 | 0.63 | 2400 | 0.4980 | -2.6255 | -3.7698 | 0.7405 | 1.1443 | -621.5709 | -527.1746 | -1.0849 | -1.2053 |
88
- | 0.5 | 0.65 | 2500 | 0.4928 | -2.3522 | -3.4480 | 0.7425 | 1.0959 | -589.3930 | -499.8447 | -1.1667 | -1.2915 |
89
- | 0.4706 | 0.68 | 2600 | 0.4921 | -2.3971 | -3.4902 | 0.7390 | 1.0931 | -593.6089 | -504.3380 | -1.1721 | -1.2961 |
90
- | 0.5242 | 0.71 | 2700 | 0.4933 | -2.5905 | -3.7015 | 0.7390 | 1.1110 | -614.7410 | -523.6794 | -1.1556 | -1.2788 |
91
- | 0.4557 | 0.73 | 2800 | 0.4921 | -2.4710 | -3.5949 | 0.7400 | 1.1239 | -604.0808 | -511.7323 | -1.1781 | -1.3009 |
92
- | 0.523 | 0.76 | 2900 | 0.4899 | -2.5572 | -3.6406 | 0.7435 | 1.0834 | -608.6472 | -520.3428 | -1.1831 | -1.3050 |
93
- | 0.4588 | 0.79 | 3000 | 0.4897 | -2.5669 | -3.6213 | 0.7415 | 1.0544 | -606.7161 | -521.3174 | -1.1914 | -1.3136 |
94
- | 0.5038 | 0.81 | 3100 | 0.4894 | -2.6148 | -3.7110 | 0.7400 | 1.0961 | -615.6882 | -526.1104 | -1.1866 | -1.3089 |
95
- | 0.5 | 0.84 | 3200 | 0.4889 | -2.5558 | -3.6512 | 0.7435 | 1.0955 | -609.7109 | -520.2028 | -1.1907 | -1.3130 |
96
- | 0.5164 | 0.86 | 3300 | 0.4891 | -2.5467 | -3.6430 | 0.7415 | 1.0963 | -608.8884 | -519.2968 | -1.1940 | -1.3162 |
97
- | 0.4554 | 0.89 | 3400 | 0.4889 | -2.5665 | -3.6678 | 0.7410 | 1.1014 | -611.3746 | -521.2744 | -1.1941 | -1.3162 |
98
- | 0.5354 | 0.92 | 3500 | 0.4888 | -2.5581 | -3.6613 | 0.7410 | 1.1032 | -610.7186 | -520.4333 | -1.1966 | -1.3187 |
99
- | 0.4576 | 0.94 | 3600 | 0.4890 | -2.5580 | -3.6613 | 0.7395 | 1.1033 | -610.7242 | -520.4294 | -1.1960 | -1.3180 |
100
- | 0.4816 | 0.97 | 3700 | 0.4889 | -2.5574 | -3.6608 | 0.7410 | 1.1034 | -610.6686 | -520.3651 | -1.1920 | -1.3143 |
101
- | 0.5057 | 0.99 | 3800 | 0.4889 | -2.5570 | -3.6602 | 0.7420 | 1.1032 | -610.6146 | -520.3311 | -1.1934 | -1.3156 |
102
 
103
 
104
  ### Framework versions
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
+ base_model: alignment-handbook/zephyr-7b-sft-full
9
  model-index:
10
  - name: zephyr-7b-dpo-qlora
11
  results: []
 
16
 
17
  # zephyr-7b-dpo-qlora
18
 
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.5056
22
+ - Rewards/chosen: -1.4058
23
+ - Rewards/rejected: -2.2921
24
+ - Rewards/accuracies: 0.7345
25
+ - Rewards/margins: 0.8863
26
+ - Logps/rejected: -492.4564
27
+ - Logps/chosen: -425.7130
28
+ - Logits/rejected: -1.8131
29
+ - Logits/chosen: -1.9265
30
 
31
  ## Model description
32
 
 
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.6896 | 0.03 | 100 | 0.6884 | 0.0072 | -0.0023 | 0.6745 | 0.0096 | -263.4773 | -284.4093 | -2.5586 | -2.6909 |
65
+ | 0.6699 | 0.05 | 200 | 0.6726 | 0.0082 | -0.0359 | 0.6895 | 0.0441 | -266.8299 | -284.3103 | -2.5495 | -2.6813 |
66
+ | 0.636 | 0.08 | 300 | 0.6466 | -0.0002 | -0.1125 | 0.6780 | 0.1123 | -274.4987 | -285.1534 | -2.5520 | -2.6823 |
67
+ | 0.6312 | 0.1 | 400 | 0.6191 | -0.2138 | -0.4222 | 0.6805 | 0.2084 | -305.4655 | -306.5131 | -2.5251 | -2.6530 |
68
+ | 0.5918 | 0.13 | 500 | 0.6031 | -0.2627 | -0.5412 | 0.6880 | 0.2785 | -317.3649 | -311.4058 | -2.5298 | -2.6575 |
69
+ | 0.6012 | 0.16 | 600 | 0.5928 | -0.5129 | -0.8554 | 0.6935 | 0.3424 | -348.7829 | -336.4283 | -2.5443 | -2.6737 |
70
+ | 0.5823 | 0.18 | 700 | 0.5811 | -0.5775 | -1.0207 | 0.7000 | 0.4432 | -365.3115 | -342.8825 | -2.3446 | -2.4662 |
71
+ | 0.5502 | 0.21 | 800 | 0.5688 | -0.5710 | -1.0329 | 0.7040 | 0.4619 | -366.5324 | -342.2334 | -2.3173 | -2.4395 |
72
+ | 0.551 | 0.24 | 900 | 0.5723 | -0.5585 | -1.0146 | 0.7100 | 0.4561 | -364.7085 | -340.9870 | -2.2573 | -2.3767 |
73
+ | 0.5684 | 0.26 | 1000 | 0.5602 | -0.7542 | -1.3111 | 0.7070 | 0.5569 | -394.3551 | -360.5555 | -2.2283 | -2.3464 |
74
+ | 0.5722 | 0.29 | 1100 | 0.5429 | -0.7936 | -1.4574 | 0.7240 | 0.6638 | -408.9803 | -364.4904 | -2.0677 | -2.1820 |
75
+ | 0.5866 | 0.31 | 1200 | 0.5338 | -1.0463 | -1.7337 | 0.7205 | 0.6874 | -436.6128 | -389.7662 | -2.0249 | -2.1388 |
76
+ | 0.5659 | 0.34 | 1300 | 0.5310 | -0.8607 | -1.5398 | 0.7310 | 0.6792 | -417.2296 | -371.2006 | -1.9893 | -2.1049 |
77
+ | 0.5625 | 0.37 | 1400 | 0.5295 | -0.7999 | -1.5056 | 0.7215 | 0.7058 | -413.8092 | -365.1206 | -1.9254 | -2.0391 |
78
+ | 0.4575 | 0.39 | 1500 | 0.5266 | -1.1455 | -1.9646 | 0.7260 | 0.8191 | -459.7086 | -399.6889 | -1.9105 | -2.0252 |
79
+ | 0.5855 | 0.42 | 1600 | 0.5227 | -1.0359 | -1.7628 | 0.7345 | 0.7269 | -439.5246 | -388.7278 | -1.9276 | -2.0403 |
80
+ | 0.5333 | 0.44 | 1700 | 0.5155 | -1.1618 | -1.9731 | 0.7310 | 0.8113 | -460.5566 | -401.3148 | -1.9572 | -2.0732 |
81
+ | 0.5055 | 0.47 | 1800 | 0.5181 | -1.1105 | -1.8968 | 0.7330 | 0.7863 | -452.9257 | -396.1870 | -1.9572 | -2.0727 |
82
+ | 0.4687 | 0.5 | 1900 | 0.5198 | -1.4078 | -2.3064 | 0.7290 | 0.8986 | -493.8867 | -425.9163 | -1.8519 | -1.9678 |
83
+ | 0.4936 | 0.52 | 2000 | 0.5123 | -1.4097 | -2.2536 | 0.7290 | 0.8438 | -488.6001 | -426.1056 | -1.8371 | -1.9508 |
84
+ | 0.5058 | 0.55 | 2100 | 0.5121 | -1.4030 | -2.2804 | 0.7320 | 0.8773 | -491.2808 | -425.4353 | -1.8156 | -1.9302 |
85
+ | 0.491 | 0.58 | 2200 | 0.5102 | -1.2883 | -2.1172 | 0.7300 | 0.8289 | -474.9657 | -413.9656 | -1.8893 | -2.0053 |
86
+ | 0.4923 | 0.6 | 2300 | 0.5107 | -1.2460 | -2.0925 | 0.7320 | 0.8465 | -472.4916 | -409.7295 | -1.8949 | -2.0101 |
87
+ | 0.4718 | 0.63 | 2400 | 0.5093 | -1.3443 | -2.1905 | 0.7265 | 0.8462 | -482.2936 | -419.5653 | -1.8618 | -1.9757 |
88
+ | 0.5187 | 0.65 | 2500 | 0.5103 | -1.3233 | -2.2095 | 0.7285 | 0.8862 | -484.1988 | -417.4668 | -1.8641 | -1.9796 |
89
+ | 0.5025 | 0.68 | 2600 | 0.5115 | -1.2910 | -2.1842 | 0.7315 | 0.8932 | -481.6620 | -414.2359 | -1.8388 | -1.9538 |
90
+ | 0.4946 | 0.71 | 2700 | 0.5094 | -1.3454 | -2.2424 | 0.7300 | 0.8970 | -487.4804 | -419.6713 | -1.8200 | -1.9339 |
91
+ | 0.5054 | 0.73 | 2800 | 0.5085 | -1.4083 | -2.3252 | 0.7320 | 0.9169 | -495.7629 | -425.9614 | -1.8042 | -1.9180 |
92
+ | 0.5159 | 0.76 | 2900 | 0.5066 | -1.3467 | -2.2328 | 0.7320 | 0.8861 | -486.5227 | -419.8022 | -1.8193 | -1.9330 |
93
+ | 0.4671 | 0.79 | 3000 | 0.5062 | -1.4194 | -2.3064 | 0.7325 | 0.8870 | -493.8865 | -427.0751 | -1.8140 | -1.9274 |
94
+ | 0.4864 | 0.81 | 3100 | 0.5059 | -1.4248 | -2.3084 | 0.7330 | 0.8836 | -494.0863 | -427.6172 | -1.8158 | -1.9291 |
95
+ | 0.5101 | 0.84 | 3200 | 0.5056 | -1.4159 | -2.2981 | 0.7340 | 0.8821 | -493.0526 | -426.7279 | -1.8167 | -1.9300 |
96
+ | 0.5317 | 0.86 | 3300 | 0.5056 | -1.4029 | -2.2863 | 0.7355 | 0.8834 | -491.8742 | -425.4280 | -1.8139 | -1.9273 |
97
+ | 0.4668 | 0.89 | 3400 | 0.5055 | -1.4064 | -2.2921 | 0.7350 | 0.8857 | -492.4527 | -425.7719 | -1.8132 | -1.9266 |
98
+ | 0.5671 | 0.92 | 3500 | 0.5056 | -1.4036 | -2.2899 | 0.7345 | 0.8863 | -492.2395 | -425.4986 | -1.8158 | -1.9291 |
99
+ | 0.4708 | 0.94 | 3600 | 0.5056 | -1.4050 | -2.2912 | 0.7345 | 0.8862 | -492.3603 | -425.6342 | -1.8127 | -1.9261 |
100
+ | 0.4904 | 0.97 | 3700 | 0.5054 | -1.4047 | -2.2913 | 0.7355 | 0.8866 | -492.3736 | -425.6043 | -1.8155 | -1.9289 |
101
+ | 0.5001 | 0.99 | 3800 | 0.5056 | -1.4058 | -2.2921 | 0.7345 | 0.8863 | -492.4564 | -425.7130 | -1.8131 | -1.9265 |
102
 
103
 
104
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:923ceec33abc352e1d8ba4ff849344216f7dc442a4fe5738f6d1fb44d2f665bd
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e237784759ae82d05007c64de0d6b6905c61fbdaed669850c91f6ca78555089f
3
  size 671150064
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 0.5232822467709236,
4
- "train_runtime": 82986.4165,
5
  "train_samples": 61134,
6
- "train_samples_per_second": 0.737,
7
  "train_steps_per_second": 0.046
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.5390157630246398,
4
+ "train_runtime": 82744.187,
5
  "train_samples": 61134,
6
+ "train_samples_per_second": 0.739,
7
  "train_steps_per_second": 0.046
8
  }
runs/Apr05_19-58-41_allennlp-cirrascale-50.reviz.ai2.in/events.out.tfevents.1712372721.allennlp-cirrascale-50.reviz.ai2.in.49327.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:408c8ddad6076c0eb3ab86a33a189cd1986669f39c47319b2f50c8da76749b55
3
- size 295334
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e0c1bce4cedb7cff0133d1cbff2ee9d404c387bcc76b522de150a4ab7862798
3
+ size 297064
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 0.5232822467709236,
4
- "train_runtime": 82986.4165,
5
  "train_samples": 61134,
6
- "train_samples_per_second": 0.737,
7
  "train_steps_per_second": 0.046
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.5390157630246398,
4
+ "train_runtime": 82744.187,
5
  "train_samples": 61134,
6
+ "train_samples_per_second": 0.739,
7
  "train_steps_per_second": 0.046
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff