beamaia commited on
Commit
d585189
1 Parent(s): 5f375f4

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +72 -69
  2. checkpoint-100/README.md +202 -0
  3. checkpoint-100/adapter_config.json +36 -0
  4. checkpoint-100/adapter_model.safetensors +3 -0
  5. checkpoint-100/optimizer.pt +3 -0
  6. checkpoint-100/rng_state.pth +3 -0
  7. checkpoint-100/scheduler.pt +3 -0
  8. checkpoint-100/special_tokens_map.json +29 -0
  9. checkpoint-100/tokenizer.json +0 -0
  10. checkpoint-100/tokenizer.model +3 -0
  11. checkpoint-100/tokenizer_config.json +50 -0
  12. checkpoint-100/trainer_state.json +114 -0
  13. checkpoint-100/training_args.bin +3 -0
  14. checkpoint-1000/README.md +202 -0
  15. checkpoint-1000/adapter_config.json +36 -0
  16. checkpoint-1000/adapter_model.safetensors +3 -0
  17. checkpoint-1000/optimizer.pt +3 -0
  18. checkpoint-1000/rng_state.pth +3 -0
  19. checkpoint-1000/scheduler.pt +3 -0
  20. checkpoint-1000/special_tokens_map.json +29 -0
  21. checkpoint-1000/tokenizer.json +0 -0
  22. checkpoint-1000/tokenizer.model +3 -0
  23. checkpoint-1000/tokenizer_config.json +50 -0
  24. checkpoint-1000/trainer_state.json +951 -0
  25. checkpoint-1000/training_args.bin +3 -0
  26. checkpoint-1100/README.md +202 -0
  27. checkpoint-1100/adapter_config.json +36 -0
  28. checkpoint-1100/adapter_model.safetensors +3 -0
  29. checkpoint-1100/optimizer.pt +3 -0
  30. checkpoint-1100/rng_state.pth +3 -0
  31. checkpoint-1100/scheduler.pt +3 -0
  32. checkpoint-1100/special_tokens_map.json +29 -0
  33. checkpoint-1100/tokenizer.json +0 -0
  34. checkpoint-1100/tokenizer.model +3 -0
  35. checkpoint-1100/tokenizer_config.json +50 -0
  36. checkpoint-1100/trainer_state.json +1044 -0
  37. checkpoint-1100/training_args.bin +3 -0
  38. checkpoint-1200/README.md +202 -0
  39. checkpoint-1200/adapter_config.json +36 -0
  40. checkpoint-1200/adapter_model.safetensors +3 -0
  41. checkpoint-1200/optimizer.pt +3 -0
  42. checkpoint-1200/rng_state.pth +3 -0
  43. checkpoint-1200/scheduler.pt +3 -0
  44. checkpoint-1200/special_tokens_map.json +29 -0
  45. checkpoint-1200/tokenizer.json +0 -0
  46. checkpoint-1200/tokenizer.model +3 -0
  47. checkpoint-1200/tokenizer_config.json +50 -0
  48. checkpoint-1200/trainer_state.json +1137 -0
  49. checkpoint-1200/training_args.bin +3 -0
  50. checkpoint-1300/README.md +202 -0
README.md CHANGED
@@ -1,99 +1,102 @@
1
  ---
2
  license: mit
3
- library_name: peft
4
  tags:
5
- - trl
6
- - kto
7
- - generated_from_trainer
8
  base_model: HuggingFaceH4/zephyr-7b-beta
9
  model-index:
10
- - name: WeniGPT-Agents-Zephyr-1.0.25-KTO
11
  results: []
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # WeniGPT-Agents-Zephyr-1.0.25-KTO
 
18
 
19
- This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.5
22
- - Rewards/chosen: -195.8677
23
- - Rewards/rejected: -165.2624
24
- - Rewards/margins: -30.6053
25
- - Kl: 0.0
26
- - Logps/chosen: -2238.1643
27
- - Logps/rejected: -1890.7997
28
 
29
- ## Model description
30
 
31
- More information needed
32
 
33
- ## Intended uses & limitations
34
 
35
- More information needed
36
 
37
- ## Training and evaluation data
 
 
 
 
38
 
39
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- ## Training procedure
 
 
 
 
 
 
 
 
42
 
43
  ### Training hyperparameters
44
 
45
  The following hyperparameters were used during training:
46
  - learning_rate: 0.0002
47
- - train_batch_size: 4
48
- - eval_batch_size: 4
49
- - seed: 42
50
  - gradient_accumulation_steps: 4
 
51
  - total_train_batch_size: 16
52
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
- - lr_scheduler_type: linear
54
- - lr_scheduler_warmup_ratio: 0.03
55
- - training_steps: 1470
56
- - mixed_precision_training: Native AMP
57
 
58
  ### Training results
59
 
60
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/margins | Kl | Logps/chosen | Logps/rejected |
61
- |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:---------------:|:---:|:------------:|:--------------:|
62
- | 0.6966 | 0.33 | 50 | 0.5063 | -13.4129 | -12.4081 | -1.0049 | 0.0 | -413.6161 | -362.2560 |
63
- | 0.754 | 0.66 | 100 | 0.5000 | -174.7515 | -145.9646 | -28.7869 | 0.0 | -2027.0018 | -1697.8218 |
64
- | 0.6274 | 0.99 | 150 | 0.5 | -195.8329 | -165.1599 | -30.6730 | 0.0 | -2237.8149 | -1889.7742 |
65
- | 0.642 | 1.32 | 200 | 0.5000 | -195.1430 | -164.6777 | -30.4653 | 0.0 | -2230.9163 | -1884.9520 |
66
- | 0.6241 | 1.65 | 250 | 0.5000 | -195.1471 | -164.6848 | -30.4623 | 0.0 | -2230.9573 | -1885.0226 |
67
- | 0.7477 | 1.98 | 300 | 0.5 | -195.8677 | -165.2624 | -30.6053 | 0.0 | -2238.1643 | -1890.7997 |
68
- | 0.8685 | 2.31 | 350 | 0.5 | -195.8568 | -165.2519 | -30.6049 | 0.0 | -2238.0549 | -1890.6946 |
69
- | 0.693 | 2.64 | 400 | 0.5 | -195.8341 | -165.2328 | -30.6013 | 0.0 | -2237.8274 | -1890.5028 |
70
- | 0.686 | 2.97 | 450 | 0.5 | -195.8235 | -165.2227 | -30.6008 | 0.0 | -2237.7224 | -1890.4027 |
71
- | 0.6119 | 3.3 | 500 | 0.5 | -195.8122 | -165.2139 | -30.5983 | 0.0 | -2237.6084 | -1890.3141 |
72
- | 0.5902 | 3.63 | 550 | 0.5 | -195.8078 | -165.2129 | -30.5949 | 0.0 | -2237.5649 | -1890.3043 |
73
- | 0.7106 | 3.96 | 600 | 0.5 | -196.2488 | -165.5701 | -30.6787 | 0.0 | -2241.9751 | -1893.8765 |
74
- | 0.8232 | 4.29 | 650 | 0.5 | -196.2429 | -165.5582 | -30.6847 | 0.0 | -2241.9155 | -1893.7571 |
75
- | 0.5881 | 4.62 | 700 | 0.5 | -197.1647 | -166.3029 | -30.8618 | 0.0 | -2251.1340 | -1901.2047 |
76
- | 0.6156 | 4.95 | 750 | 0.5 | -197.1416 | -166.2842 | -30.8573 | 0.0 | -2250.9023 | -1901.0179 |
77
- | 0.6291 | 5.28 | 800 | 0.5 | -197.1509 | -166.2928 | -30.8580 | 0.0 | -2250.9958 | -1901.1036 |
78
- | 0.6285 | 5.61 | 850 | 0.5 | -197.1602 | -166.2982 | -30.8620 | 0.0 | -2251.0884 | -1901.1571 |
79
- | 0.6918 | 5.94 | 900 | 0.5 | -197.1623 | -166.3002 | -30.8621 | 0.0 | -2251.1104 | -1901.1774 |
80
- | 0.7869 | 6.27 | 950 | 0.5 | -197.1630 | -166.3040 | -30.8591 | 0.0 | -2251.1169 | -1901.2148 |
81
- | 0.5483 | 6.6 | 1000 | 0.5 | -197.1648 | -166.2998 | -30.8650 | 0.0 | -2251.1345 | -1901.1730 |
82
- | 0.7744 | 6.93 | 1050 | 0.5 | -197.5333 | -166.5969 | -30.9364 | 0.0 | -2254.8201 | -1904.1442 |
83
- | 0.9077 | 7.26 | 1100 | 0.5 | -197.5402 | -166.6008 | -30.9394 | 0.0 | -2254.8884 | -1904.1827 |
84
- | 0.664 | 7.59 | 1150 | 0.5 | -197.2621 | -166.3788 | -30.8832 | 0.0 | -2252.1074 | -1901.9637 |
85
- | 0.6126 | 7.92 | 1200 | 0.5 | -197.2483 | -166.3705 | -30.8778 | 0.0 | -2251.9697 | -1901.8805 |
86
- | 0.8377 | 8.25 | 1250 | 0.5 | -197.1308 | -166.2760 | -30.8547 | 0.0 | -2250.7944 | -1900.9357 |
87
- | 0.6109 | 8.58 | 1300 | 0.5 | -197.1868 | -166.3199 | -30.8669 | 0.0 | -2251.3545 | -1901.3741 |
88
- | 0.7432 | 8.91 | 1350 | 0.5 | -197.2601 | -166.3793 | -30.8808 | 0.0 | -2252.0879 | -1901.9680 |
89
- | 0.8664 | 9.24 | 1400 | 0.5 | -197.1278 | -166.2694 | -30.8584 | 0.0 | -2250.7642 | -1900.8693 |
90
- | 0.7237 | 9.57 | 1450 | 0.5 | -197.125 | -166.2689 | -30.8561 | 0.0 | -2250.7366 | -1900.8641 |
91
-
92
-
93
  ### Framework versions
94
 
95
- - PEFT 0.10.0
96
- - Transformers 4.38.2
97
- - Pytorch 2.1.0+cu118
98
- - Datasets 2.18.0
99
- - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: "trl"
4
  tags:
5
+ - KTO
6
+ - WeniGPT
 
7
  base_model: HuggingFaceH4/zephyr-7b-beta
8
  model-index:
9
+ - name: Weni/WeniGPT-Agents-Zephyr-1.0.25-KTO
10
  results: []
11
+ language: ['pt']
12
  ---
13
 
14
+ # Weni/WeniGPT-Agents-Zephyr-1.0.25-KTO
 
15
 
16
+ This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta] on the dataset Weni/wenigpt-agent-1.4.0 with the KTO trainer. It is part of the WeniGPT project for [Weni](https://weni.ai/).
17
+ Description: Experiment with a new tokenizer configuration for chat template of zephyr
18
 
 
19
  It achieves the following results on the evaluation set:
20
+ {'eval_loss': 0.5, 'eval_runtime': 169.8846, 'eval_samples_per_second': 2.06, 'eval_steps_per_second': 0.518, 'eval_rewards/chosen': -195.86773681640625, 'eval_rewards/rejected': -165.26242065429688, 'eval_rewards/margins': -30.605329513549805, 'eval_kl': 0.0, 'eval_logps/chosen': -2238.164306640625, 'eval_logps/rejected': -1890.7996826171875, 'epoch': 9.7}
 
 
 
 
 
 
21
 
22
+ ## Intended uses & limitations
23
 
24
+ This model has not been trained to avoid specific intructions.
25
 
26
+ ## Training procedure
27
 
28
+ Finetuning was done on the model HuggingFaceH4/zephyr-7b-beta with the following prompt:
29
 
30
+ ```
31
+ ---------------------
32
+ System_prompt:
33
+ Agora você se chama {name}, você é {occupation} e seu objetivo é {chatbot_goal}. O adjetivo que mais define a sua personalidade é {adjective} e você se comporta da seguinte forma:
34
+ {instructions_formatted}
35
 
36
+ Na sua memória você tem esse contexto:
37
+ {context}
38
+
39
+ Lista de requisitos:
40
+ - Responda de forma natural, mas nunca fale sobre um assunto fora do contexto.
41
+ - Nunca traga informações do seu próprio conhecimento.
42
+ - Repito é crucial que você responda usando apenas informações do contexto.
43
+ - Nunca mencione o contexto fornecido.
44
+ - Nunca mencione a pergunta fornecida.
45
+ - Gere a resposta mais útil possível para a pergunta usando informações do conexto acima.
46
+ - Nunca elabore sobre o porque e como você fez a tarefa, apenas responda.
47
+
48
+
49
+ ---------------------
50
+ Question:
51
+ {question}
52
 
53
+
54
+ ---------------------
55
+ Response:
56
+ {answer}
57
+
58
+
59
+ ---------------------
60
+
61
+ ```
62
 
63
  ### Training hyperparameters
64
 
65
  The following hyperparameters were used during training:
66
  - learning_rate: 0.0002
67
+ - per_device_train_batch_size: 4
68
+ - per_device_eval_batch_size: 4
 
69
  - gradient_accumulation_steps: 4
70
+ - num_gpus: 1
71
  - total_train_batch_size: 16
72
+ - optimizer: AdamW
73
+ - lr_scheduler_type: cosine
74
+ - num_steps: 1470
75
+ - quantization_type: bitsandbytes
76
+ - LoRA: ("\n - bits: 4\n - use_exllama: True\n - device_map: auto\n - use_cache: False\n - lora_r: 8\n - lora_alpha: 16\n - lora_dropout: 0.05\n - bias: none\n - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'lm_head', 'embed_tokens']\n - task_type: CAUSAL_LM",)
77
 
78
  ### Training results
79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ### Framework versions
81
 
82
+ - transformers==4.38.2
83
+ - datasets==2.18.0
84
+ - peft==0.10.0
85
+ - safetensors==0.4.2
86
+ - evaluate==0.4.1
87
+ - bitsandbytes==0.43
88
+ - huggingface_hub==0.22.2
89
+ - seqeval==1.2.2
90
+ - optimum==1.18.1
91
+ - auto-gptq==0.7.1
92
+ - gpustat==1.1.1
93
+ - deepspeed==0.14.0
94
+ - wandb==0.16.6
95
+ - trl==0.8.1
96
+ - accelerate==0.29.2
97
+ - coloredlogs==15.0.1
98
+ - traitlets==5.14.2
99
+ - autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.4/autoawq-0.2.4+cu118-cp310-cp310-linux_x86_64.whl
100
+
101
+ ### Hardware
102
+ - Cloud provided: runpod.io
checkpoint-100/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: HuggingFaceH4/zephyr-7b-beta
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
checkpoint-100/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "HuggingFaceH4/zephyr-7b-beta",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "up_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "down_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "lm_head",
31
+ "embed_tokens"
32
+ ],
33
+ "task_type": "CAUSAL_LM",
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
checkpoint-100/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f412f4b58c86c1795d0dbf5dc47a10ddbdc92377fd3aa4ac59c68d195543703
3
+ size 1134834064
checkpoint-100/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3ee0cb59d1cfd6f7f29e012f60709106a6385d247c460dc502c1caf5ce6576b
3
+ size 172772766
checkpoint-100/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9196a1e708bf24d6abba41cce3f8558820acc3e50f9394c5955e29eb41ffea3d
3
+ size 14244
checkpoint-100/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a95477309af9566acf8df625eff5b2fe03c3566409932f50c562e95a7b57865
3
+ size 1064
checkpoint-100/special_tokens_map.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": "<unk>",
22
+ "unk_token": {
23
+ "content": "<unk>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false
28
+ }
29
+ }
checkpoint-100/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-100/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-100/tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "legacy": true,
40
+ "max_lenght": 8192,
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "<unk>",
43
+ "padding": true,
44
+ "sp_model_kwargs": {},
45
+ "spaces_between_special_tokens": false,
46
+ "tokenizer_class": "LlamaTokenizer",
47
+ "truncation_side": "left",
48
+ "unk_token": "<unk>",
49
+ "use_default_system_prompt": true
50
+ }
checkpoint-100/trainer_state.json ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5000000596046448,
3
+ "best_model_checkpoint": "./zephyr/10-04-24-Weni-WeniGPT-Agents-Zephyr-1.0.25-KTO_Experiment with a new tokenizer configuration for chat template of zephyr-2_max_steps-1470_batch_16_2024-04-10_ppid_9/checkpoint-100",
4
+ "epoch": 0.6600660066006601,
5
+ "eval_steps": 50,
6
+ "global_step": 100,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.13,
13
+ "grad_norm": 57.293792724609375,
14
+ "kl": 0.03853478282690048,
15
+ "learning_rate": 6.222222222222222e-05,
16
+ "logps/chosen": NaN,
17
+ "logps/rejected": NaN,
18
+ "loss": 0.7078,
19
+ "rewards/chosen": NaN,
20
+ "rewards/margins": NaN,
21
+ "rewards/rejected": NaN,
22
+ "step": 20
23
+ },
24
+ {
25
+ "epoch": 0.26,
26
+ "grad_norm": 112.50944519042969,
27
+ "kl": 3.2648494243621826,
28
+ "learning_rate": 0.00014666666666666666,
29
+ "logps/chosen": NaN,
30
+ "logps/rejected": NaN,
31
+ "loss": 0.6966,
32
+ "rewards/chosen": NaN,
33
+ "rewards/margins": NaN,
34
+ "rewards/rejected": NaN,
35
+ "step": 40
36
+ },
37
+ {
38
+ "epoch": 0.33,
39
+ "eval_kl": 0.0,
40
+ "eval_logps/chosen": -413.6161193847656,
41
+ "eval_logps/rejected": -362.2559509277344,
42
+ "eval_loss": 0.5063381791114807,
43
+ "eval_rewards/chosen": -13.412939071655273,
44
+ "eval_rewards/margins": -1.0048810243606567,
45
+ "eval_rewards/rejected": -12.408059120178223,
46
+ "eval_runtime": 170.1826,
47
+ "eval_samples_per_second": 2.057,
48
+ "eval_steps_per_second": 0.517,
49
+ "step": 50
50
+ },
51
+ {
52
+ "epoch": 0.4,
53
+ "grad_norm": 19.94582748413086,
54
+ "kl": 0.45922356843948364,
55
+ "learning_rate": 0.00019887719298245616,
56
+ "logps/chosen": NaN,
57
+ "logps/rejected": NaN,
58
+ "loss": 0.5743,
59
+ "rewards/chosen": NaN,
60
+ "rewards/margins": NaN,
61
+ "rewards/rejected": NaN,
62
+ "step": 60
63
+ },
64
+ {
65
+ "epoch": 0.53,
66
+ "grad_norm": 79.92957305908203,
67
+ "kl": 0.0,
68
+ "learning_rate": 0.0001960701754385965,
69
+ "logps/chosen": NaN,
70
+ "logps/rejected": NaN,
71
+ "loss": 0.6108,
72
+ "rewards/chosen": NaN,
73
+ "rewards/margins": NaN,
74
+ "rewards/rejected": NaN,
75
+ "step": 80
76
+ },
77
+ {
78
+ "epoch": 0.66,
79
+ "grad_norm": 0.06103940308094025,
80
+ "kl": 0.0,
81
+ "learning_rate": 0.00019326315789473686,
82
+ "logps/chosen": NaN,
83
+ "logps/rejected": NaN,
84
+ "loss": 0.754,
85
+ "rewards/chosen": NaN,
86
+ "rewards/margins": NaN,
87
+ "rewards/rejected": NaN,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.66,
92
+ "eval_kl": 0.0,
93
+ "eval_logps/chosen": -2027.0018310546875,
94
+ "eval_logps/rejected": -1697.82177734375,
95
+ "eval_loss": 0.5000000596046448,
96
+ "eval_rewards/chosen": -174.75149536132812,
97
+ "eval_rewards/margins": -28.786863327026367,
98
+ "eval_rewards/rejected": -145.96463012695312,
99
+ "eval_runtime": 170.0562,
100
+ "eval_samples_per_second": 2.058,
101
+ "eval_steps_per_second": 0.517,
102
+ "step": 100
103
+ }
104
+ ],
105
+ "logging_steps": 20,
106
+ "max_steps": 1470,
107
+ "num_input_tokens_seen": 0,
108
+ "num_train_epochs": 10,
109
+ "save_steps": 100,
110
+ "total_flos": 0.0,
111
+ "train_batch_size": 4,
112
+ "trial_name": null,
113
+ "trial_params": null
114
+ }
checkpoint-100/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5309801a19049c58de3649400afbb558334e14e33dff69ca022789cf2400ea
3
+ size 5688
checkpoint-1000/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: HuggingFaceH4/zephyr-7b-beta
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
checkpoint-1000/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "HuggingFaceH4/zephyr-7b-beta",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "up_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "down_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "lm_head",
31
+ "embed_tokens"
32
+ ],
33
+ "task_type": "CAUSAL_LM",
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
checkpoint-1000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a6735a60251dd555cec76162e1919767b9354745a72e80c1859d1992e67e76a
3
+ size 1134834064
checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92b9e2da41a1603a53b6db76ca19441521bf24e991bafc75689109d861afce43
3
+ size 172772766
checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3bd3cafcd141485c5526689e7070ba65dab1e4639fbae44141ae41439003c1f
3
+ size 14244
checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7264db995851472cbc0a4e596f81fcf5a6c9d14b6cfc09096e5a48386a62256a
3
+ size 1064
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": "<unk>",
22
+ "unk_token": {
23
+ "content": "<unk>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false
28
+ }
29
+ }
checkpoint-1000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-1000/tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "legacy": true,
40
+ "max_lenght": 8192,
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "<unk>",
43
+ "padding": true,
44
+ "sp_model_kwargs": {},
45
+ "spaces_between_special_tokens": false,
46
+ "tokenizer_class": "LlamaTokenizer",
47
+ "truncation_side": "left",
48
+ "unk_token": "<unk>",
49
+ "use_default_system_prompt": true
50
+ }
checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,951 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5,
3
+ "best_model_checkpoint": "./zephyr/10-04-24-Weni-WeniGPT-Agents-Zephyr-1.0.25-KTO_Experiment with a new tokenizer configuration for chat template of zephyr-2_max_steps-1470_batch_16_2024-04-10_ppid_9/checkpoint-300",
4
+ "epoch": 6.600660066006601,
5
+ "eval_steps": 50,
6
+ "global_step": 1000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.13,
13
+ "grad_norm": 57.293792724609375,
14
+ "kl": 0.03853478282690048,
15
+ "learning_rate": 6.222222222222222e-05,
16
+ "logps/chosen": NaN,
17
+ "logps/rejected": NaN,
18
+ "loss": 0.7078,
19
+ "rewards/chosen": NaN,
20
+ "rewards/margins": NaN,
21
+ "rewards/rejected": NaN,
22
+ "step": 20
23
+ },
24
+ {
25
+ "epoch": 0.26,
26
+ "grad_norm": 112.50944519042969,
27
+ "kl": 3.2648494243621826,
28
+ "learning_rate": 0.00014666666666666666,
29
+ "logps/chosen": NaN,
30
+ "logps/rejected": NaN,
31
+ "loss": 0.6966,
32
+ "rewards/chosen": NaN,
33
+ "rewards/margins": NaN,
34
+ "rewards/rejected": NaN,
35
+ "step": 40
36
+ },
37
+ {
38
+ "epoch": 0.33,
39
+ "eval_kl": 0.0,
40
+ "eval_logps/chosen": -413.6161193847656,
41
+ "eval_logps/rejected": -362.2559509277344,
42
+ "eval_loss": 0.5063381791114807,
43
+ "eval_rewards/chosen": -13.412939071655273,
44
+ "eval_rewards/margins": -1.0048810243606567,
45
+ "eval_rewards/rejected": -12.408059120178223,
46
+ "eval_runtime": 170.1826,
47
+ "eval_samples_per_second": 2.057,
48
+ "eval_steps_per_second": 0.517,
49
+ "step": 50
50
+ },
51
+ {
52
+ "epoch": 0.4,
53
+ "grad_norm": 19.94582748413086,
54
+ "kl": 0.45922356843948364,
55
+ "learning_rate": 0.00019887719298245616,
56
+ "logps/chosen": NaN,
57
+ "logps/rejected": NaN,
58
+ "loss": 0.5743,
59
+ "rewards/chosen": NaN,
60
+ "rewards/margins": NaN,
61
+ "rewards/rejected": NaN,
62
+ "step": 60
63
+ },
64
+ {
65
+ "epoch": 0.53,
66
+ "grad_norm": 79.92957305908203,
67
+ "kl": 0.0,
68
+ "learning_rate": 0.0001960701754385965,
69
+ "logps/chosen": NaN,
70
+ "logps/rejected": NaN,
71
+ "loss": 0.6108,
72
+ "rewards/chosen": NaN,
73
+ "rewards/margins": NaN,
74
+ "rewards/rejected": NaN,
75
+ "step": 80
76
+ },
77
+ {
78
+ "epoch": 0.66,
79
+ "grad_norm": 0.06103940308094025,
80
+ "kl": 0.0,
81
+ "learning_rate": 0.00019326315789473686,
82
+ "logps/chosen": NaN,
83
+ "logps/rejected": NaN,
84
+ "loss": 0.754,
85
+ "rewards/chosen": NaN,
86
+ "rewards/margins": NaN,
87
+ "rewards/rejected": NaN,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.66,
92
+ "eval_kl": 0.0,
93
+ "eval_logps/chosen": -2027.0018310546875,
94
+ "eval_logps/rejected": -1697.82177734375,
95
+ "eval_loss": 0.5000000596046448,
96
+ "eval_rewards/chosen": -174.75149536132812,
97
+ "eval_rewards/margins": -28.786863327026367,
98
+ "eval_rewards/rejected": -145.96463012695312,
99
+ "eval_runtime": 170.0562,
100
+ "eval_samples_per_second": 2.058,
101
+ "eval_steps_per_second": 0.517,
102
+ "step": 100
103
+ },
104
+ {
105
+ "epoch": 0.79,
106
+ "grad_norm": 0.0,
107
+ "kl": 0.0,
108
+ "learning_rate": 0.0001904561403508772,
109
+ "logps/chosen": NaN,
110
+ "logps/rejected": NaN,
111
+ "loss": 0.95,
112
+ "rewards/chosen": NaN,
113
+ "rewards/margins": NaN,
114
+ "rewards/rejected": NaN,
115
+ "step": 120
116
+ },
117
+ {
118
+ "epoch": 0.92,
119
+ "grad_norm": 0.0,
120
+ "kl": 0.0,
121
+ "learning_rate": 0.00018764912280701756,
122
+ "logps/chosen": NaN,
123
+ "logps/rejected": NaN,
124
+ "loss": 0.6274,
125
+ "rewards/chosen": NaN,
126
+ "rewards/margins": NaN,
127
+ "rewards/rejected": NaN,
128
+ "step": 140
129
+ },
130
+ {
131
+ "epoch": 0.99,
132
+ "eval_kl": 0.0,
133
+ "eval_logps/chosen": -2237.81494140625,
134
+ "eval_logps/rejected": -1889.774169921875,
135
+ "eval_loss": 0.5,
136
+ "eval_rewards/chosen": -195.83285522460938,
137
+ "eval_rewards/margins": -30.672954559326172,
138
+ "eval_rewards/rejected": -165.15989685058594,
139
+ "eval_runtime": 169.8795,
140
+ "eval_samples_per_second": 2.06,
141
+ "eval_steps_per_second": 0.518,
142
+ "step": 150
143
+ },
144
+ {
145
+ "epoch": 1.06,
146
+ "grad_norm": 0.0,
147
+ "kl": 0.0,
148
+ "learning_rate": 0.0001848421052631579,
149
+ "logps/chosen": NaN,
150
+ "logps/rejected": NaN,
151
+ "loss": 0.6387,
152
+ "rewards/chosen": NaN,
153
+ "rewards/margins": NaN,
154
+ "rewards/rejected": NaN,
155
+ "step": 160
156
+ },
157
+ {
158
+ "epoch": 1.19,
159
+ "grad_norm": 0.0,
160
+ "kl": 0.0,
161
+ "learning_rate": 0.00018203508771929826,
162
+ "logps/chosen": NaN,
163
+ "logps/rejected": NaN,
164
+ "loss": 0.8327,
165
+ "rewards/chosen": NaN,
166
+ "rewards/margins": NaN,
167
+ "rewards/rejected": NaN,
168
+ "step": 180
169
+ },
170
+ {
171
+ "epoch": 1.32,
172
+ "grad_norm": 0.0,
173
+ "kl": 0.0,
174
+ "learning_rate": 0.00017922807017543862,
175
+ "logps/chosen": NaN,
176
+ "logps/rejected": NaN,
177
+ "loss": 0.642,
178
+ "rewards/chosen": NaN,
179
+ "rewards/margins": NaN,
180
+ "rewards/rejected": NaN,
181
+ "step": 200
182
+ },
183
+ {
184
+ "epoch": 1.32,
185
+ "eval_kl": 0.0,
186
+ "eval_logps/chosen": -2230.916259765625,
187
+ "eval_logps/rejected": -1884.9520263671875,
188
+ "eval_loss": 0.5000000596046448,
189
+ "eval_rewards/chosen": -195.14297485351562,
190
+ "eval_rewards/margins": -30.465293884277344,
191
+ "eval_rewards/rejected": -164.67767333984375,
192
+ "eval_runtime": 170.1489,
193
+ "eval_samples_per_second": 2.057,
194
+ "eval_steps_per_second": 0.517,
195
+ "step": 200
196
+ },
197
+ {
198
+ "epoch": 1.45,
199
+ "grad_norm": 0.0,
200
+ "kl": 0.0,
201
+ "learning_rate": 0.00017642105263157896,
202
+ "logps/chosen": NaN,
203
+ "logps/rejected": NaN,
204
+ "loss": 0.7493,
205
+ "rewards/chosen": NaN,
206
+ "rewards/margins": NaN,
207
+ "rewards/rejected": NaN,
208
+ "step": 220
209
+ },
210
+ {
211
+ "epoch": 1.58,
212
+ "grad_norm": 0.0,
213
+ "kl": 0.0,
214
+ "learning_rate": 0.0001736140350877193,
215
+ "logps/chosen": NaN,
216
+ "logps/rejected": NaN,
217
+ "loss": 0.6241,
218
+ "rewards/chosen": NaN,
219
+ "rewards/margins": NaN,
220
+ "rewards/rejected": NaN,
221
+ "step": 240
222
+ },
223
+ {
224
+ "epoch": 1.65,
225
+ "eval_kl": 0.0,
226
+ "eval_logps/chosen": -2230.957275390625,
227
+ "eval_logps/rejected": -1885.0225830078125,
228
+ "eval_loss": 0.5000000596046448,
229
+ "eval_rewards/chosen": -195.14706420898438,
230
+ "eval_rewards/margins": -30.462318420410156,
231
+ "eval_rewards/rejected": -164.68475341796875,
232
+ "eval_runtime": 170.1092,
233
+ "eval_samples_per_second": 2.058,
234
+ "eval_steps_per_second": 0.517,
235
+ "step": 250
236
+ },
237
+ {
238
+ "epoch": 1.72,
239
+ "grad_norm": 0.0,
240
+ "kl": 0.0,
241
+ "learning_rate": 0.00017080701754385965,
242
+ "logps/chosen": NaN,
243
+ "logps/rejected": NaN,
244
+ "loss": 0.9621,
245
+ "rewards/chosen": NaN,
246
+ "rewards/margins": NaN,
247
+ "rewards/rejected": NaN,
248
+ "step": 260
249
+ },
250
+ {
251
+ "epoch": 1.85,
252
+ "grad_norm": 0.0,
253
+ "kl": 0.0,
254
+ "learning_rate": 0.000168,
255
+ "logps/chosen": NaN,
256
+ "logps/rejected": NaN,
257
+ "loss": 0.7279,
258
+ "rewards/chosen": NaN,
259
+ "rewards/margins": NaN,
260
+ "rewards/rejected": NaN,
261
+ "step": 280
262
+ },
263
+ {
264
+ "epoch": 1.98,
265
+ "grad_norm": 0.0,
266
+ "kl": 0.0,
267
+ "learning_rate": 0.00016519298245614035,
268
+ "logps/chosen": NaN,
269
+ "logps/rejected": NaN,
270
+ "loss": 0.7477,
271
+ "rewards/chosen": NaN,
272
+ "rewards/margins": NaN,
273
+ "rewards/rejected": NaN,
274
+ "step": 300
275
+ },
276
+ {
277
+ "epoch": 1.98,
278
+ "eval_kl": 0.0,
279
+ "eval_logps/chosen": -2238.164306640625,
280
+ "eval_logps/rejected": -1890.7996826171875,
281
+ "eval_loss": 0.5,
282
+ "eval_rewards/chosen": -195.86773681640625,
283
+ "eval_rewards/margins": -30.605329513549805,
284
+ "eval_rewards/rejected": -165.26242065429688,
285
+ "eval_runtime": 170.0647,
286
+ "eval_samples_per_second": 2.058,
287
+ "eval_steps_per_second": 0.517,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 2.11,
292
+ "grad_norm": 0.0,
293
+ "kl": 0.0,
294
+ "learning_rate": 0.00016238596491228072,
295
+ "logps/chosen": NaN,
296
+ "logps/rejected": NaN,
297
+ "loss": 0.7111,
298
+ "rewards/chosen": NaN,
299
+ "rewards/margins": NaN,
300
+ "rewards/rejected": NaN,
301
+ "step": 320
302
+ },
303
+ {
304
+ "epoch": 2.24,
305
+ "grad_norm": 0.0,
306
+ "kl": 0.0,
307
+ "learning_rate": 0.00015957894736842105,
308
+ "logps/chosen": NaN,
309
+ "logps/rejected": NaN,
310
+ "loss": 0.8685,
311
+ "rewards/chosen": NaN,
312
+ "rewards/margins": NaN,
313
+ "rewards/rejected": NaN,
314
+ "step": 340
315
+ },
316
+ {
317
+ "epoch": 2.31,
318
+ "eval_kl": 0.0,
319
+ "eval_logps/chosen": -2238.054931640625,
320
+ "eval_logps/rejected": -1890.694580078125,
321
+ "eval_loss": 0.5,
322
+ "eval_rewards/chosen": -195.85682678222656,
323
+ "eval_rewards/margins": -30.604921340942383,
324
+ "eval_rewards/rejected": -165.2519073486328,
325
+ "eval_runtime": 170.0528,
326
+ "eval_samples_per_second": 2.058,
327
+ "eval_steps_per_second": 0.517,
328
+ "step": 350
329
+ },
330
+ {
331
+ "epoch": 2.38,
332
+ "grad_norm": 0.0,
333
+ "kl": 0.0,
334
+ "learning_rate": 0.00015677192982456142,
335
+ "logps/chosen": NaN,
336
+ "logps/rejected": NaN,
337
+ "loss": 0.6905,
338
+ "rewards/chosen": NaN,
339
+ "rewards/margins": NaN,
340
+ "rewards/rejected": NaN,
341
+ "step": 360
342
+ },
343
+ {
344
+ "epoch": 2.51,
345
+ "grad_norm": 0.0,
346
+ "kl": 0.0,
347
+ "learning_rate": 0.00015396491228070175,
348
+ "logps/chosen": NaN,
349
+ "logps/rejected": NaN,
350
+ "loss": 0.736,
351
+ "rewards/chosen": NaN,
352
+ "rewards/margins": NaN,
353
+ "rewards/rejected": NaN,
354
+ "step": 380
355
+ },
356
+ {
357
+ "epoch": 2.64,
358
+ "grad_norm": 0.0,
359
+ "kl": 0.0,
360
+ "learning_rate": 0.00015115789473684211,
361
+ "logps/chosen": NaN,
362
+ "logps/rejected": NaN,
363
+ "loss": 0.693,
364
+ "rewards/chosen": NaN,
365
+ "rewards/margins": NaN,
366
+ "rewards/rejected": NaN,
367
+ "step": 400
368
+ },
369
+ {
370
+ "epoch": 2.64,
371
+ "eval_kl": 0.0,
372
+ "eval_logps/chosen": -2237.827392578125,
373
+ "eval_logps/rejected": -1890.5028076171875,
374
+ "eval_loss": 0.5,
375
+ "eval_rewards/chosen": -195.83407592773438,
376
+ "eval_rewards/margins": -30.601318359375,
377
+ "eval_rewards/rejected": -165.23275756835938,
378
+ "eval_runtime": 170.2445,
379
+ "eval_samples_per_second": 2.056,
380
+ "eval_steps_per_second": 0.517,
381
+ "step": 400
382
+ },
383
+ {
384
+ "epoch": 2.77,
385
+ "grad_norm": 8.788210266175156e-07,
386
+ "kl": 0.0,
387
+ "learning_rate": 0.00014835087719298245,
388
+ "logps/chosen": NaN,
389
+ "logps/rejected": NaN,
390
+ "loss": 0.8652,
391
+ "rewards/chosen": NaN,
392
+ "rewards/margins": NaN,
393
+ "rewards/rejected": NaN,
394
+ "step": 420
395
+ },
396
+ {
397
+ "epoch": 2.9,
398
+ "grad_norm": 0.0,
399
+ "kl": 0.0,
400
+ "learning_rate": 0.0001455438596491228,
401
+ "logps/chosen": NaN,
402
+ "logps/rejected": NaN,
403
+ "loss": 0.686,
404
+ "rewards/chosen": NaN,
405
+ "rewards/margins": NaN,
406
+ "rewards/rejected": NaN,
407
+ "step": 440
408
+ },
409
+ {
410
+ "epoch": 2.97,
411
+ "eval_kl": 0.0,
412
+ "eval_logps/chosen": -2237.722412109375,
413
+ "eval_logps/rejected": -1890.4027099609375,
414
+ "eval_loss": 0.5,
415
+ "eval_rewards/chosen": -195.82354736328125,
416
+ "eval_rewards/margins": -30.600812911987305,
417
+ "eval_rewards/rejected": -165.22274780273438,
418
+ "eval_runtime": 170.3429,
419
+ "eval_samples_per_second": 2.055,
420
+ "eval_steps_per_second": 0.517,
421
+ "step": 450
422
+ },
423
+ {
424
+ "epoch": 3.04,
425
+ "grad_norm": 0.0,
426
+ "kl": 0.0,
427
+ "learning_rate": 0.00014273684210526318,
428
+ "logps/chosen": NaN,
429
+ "logps/rejected": NaN,
430
+ "loss": 0.6858,
431
+ "rewards/chosen": NaN,
432
+ "rewards/margins": NaN,
433
+ "rewards/rejected": NaN,
434
+ "step": 460
435
+ },
436
+ {
437
+ "epoch": 3.17,
438
+ "grad_norm": 0.0,
439
+ "kl": 0.0,
440
+ "learning_rate": 0.0001399298245614035,
441
+ "logps/chosen": NaN,
442
+ "logps/rejected": NaN,
443
+ "loss": 0.8479,
444
+ "rewards/chosen": NaN,
445
+ "rewards/margins": NaN,
446
+ "rewards/rejected": NaN,
447
+ "step": 480
448
+ },
449
+ {
450
+ "epoch": 3.3,
451
+ "grad_norm": 0.0,
452
+ "kl": 0.0,
453
+ "learning_rate": 0.00013712280701754388,
454
+ "logps/chosen": NaN,
455
+ "logps/rejected": NaN,
456
+ "loss": 0.6119,
457
+ "rewards/chosen": NaN,
458
+ "rewards/margins": NaN,
459
+ "rewards/rejected": NaN,
460
+ "step": 500
461
+ },
462
+ {
463
+ "epoch": 3.3,
464
+ "eval_kl": 0.0,
465
+ "eval_logps/chosen": -2237.6083984375,
466
+ "eval_logps/rejected": -1890.3140869140625,
467
+ "eval_loss": 0.5,
468
+ "eval_rewards/chosen": -195.81216430664062,
469
+ "eval_rewards/margins": -30.598268508911133,
470
+ "eval_rewards/rejected": -165.21388244628906,
471
+ "eval_runtime": 169.9488,
472
+ "eval_samples_per_second": 2.059,
473
+ "eval_steps_per_second": 0.518,
474
+ "step": 500
475
+ },
476
+ {
477
+ "epoch": 3.43,
478
+ "grad_norm": 0.0,
479
+ "kl": 0.0,
480
+ "learning_rate": 0.0001343157894736842,
481
+ "logps/chosen": NaN,
482
+ "logps/rejected": NaN,
483
+ "loss": 0.7107,
484
+ "rewards/chosen": NaN,
485
+ "rewards/margins": NaN,
486
+ "rewards/rejected": NaN,
487
+ "step": 520
488
+ },
489
+ {
490
+ "epoch": 3.56,
491
+ "grad_norm": 0.0,
492
+ "kl": 0.0,
493
+ "learning_rate": 0.00013150877192982455,
494
+ "logps/chosen": NaN,
495
+ "logps/rejected": NaN,
496
+ "loss": 0.5902,
497
+ "rewards/chosen": NaN,
498
+ "rewards/margins": NaN,
499
+ "rewards/rejected": NaN,
500
+ "step": 540
501
+ },
502
+ {
503
+ "epoch": 3.63,
504
+ "eval_kl": 0.0,
505
+ "eval_logps/chosen": -2237.56494140625,
506
+ "eval_logps/rejected": -1890.3043212890625,
507
+ "eval_loss": 0.5,
508
+ "eval_rewards/chosen": -195.80784606933594,
509
+ "eval_rewards/margins": -30.59491539001465,
510
+ "eval_rewards/rejected": -165.21290588378906,
511
+ "eval_runtime": 169.9756,
512
+ "eval_samples_per_second": 2.059,
513
+ "eval_steps_per_second": 0.518,
514
+ "step": 550
515
+ },
516
+ {
517
+ "epoch": 3.7,
518
+ "grad_norm": 0.0,
519
+ "kl": 0.0,
520
+ "learning_rate": 0.0001287017543859649,
521
+ "logps/chosen": NaN,
522
+ "logps/rejected": NaN,
523
+ "loss": 0.9042,
524
+ "rewards/chosen": NaN,
525
+ "rewards/margins": NaN,
526
+ "rewards/rejected": NaN,
527
+ "step": 560
528
+ },
529
+ {
530
+ "epoch": 3.83,
531
+ "grad_norm": 0.0,
532
+ "kl": 0.0,
533
+ "learning_rate": 0.00012589473684210527,
534
+ "logps/chosen": NaN,
535
+ "logps/rejected": NaN,
536
+ "loss": 0.7268,
537
+ "rewards/chosen": NaN,
538
+ "rewards/margins": NaN,
539
+ "rewards/rejected": NaN,
540
+ "step": 580
541
+ },
542
+ {
543
+ "epoch": 3.96,
544
+ "grad_norm": 0.0,
545
+ "kl": 0.0,
546
+ "learning_rate": 0.00012308771929824564,
547
+ "logps/chosen": NaN,
548
+ "logps/rejected": NaN,
549
+ "loss": 0.7106,
550
+ "rewards/chosen": NaN,
551
+ "rewards/margins": NaN,
552
+ "rewards/rejected": NaN,
553
+ "step": 600
554
+ },
555
+ {
556
+ "epoch": 3.96,
557
+ "eval_kl": 0.0,
558
+ "eval_logps/chosen": -2241.97509765625,
559
+ "eval_logps/rejected": -1893.87646484375,
560
+ "eval_loss": 0.5,
561
+ "eval_rewards/chosen": -196.24884033203125,
562
+ "eval_rewards/margins": -30.67871856689453,
563
+ "eval_rewards/rejected": -165.57012939453125,
564
+ "eval_runtime": 169.9427,
565
+ "eval_samples_per_second": 2.06,
566
+ "eval_steps_per_second": 0.518,
567
+ "step": 600
568
+ },
569
+ {
570
+ "epoch": 4.09,
571
+ "grad_norm": 0.0,
572
+ "kl": 0.0,
573
+ "learning_rate": 0.00012028070175438597,
574
+ "logps/chosen": NaN,
575
+ "logps/rejected": NaN,
576
+ "loss": 0.6829,
577
+ "rewards/chosen": NaN,
578
+ "rewards/margins": NaN,
579
+ "rewards/rejected": NaN,
580
+ "step": 620
581
+ },
582
+ {
583
+ "epoch": 4.22,
584
+ "grad_norm": 0.0,
585
+ "kl": 0.0,
586
+ "learning_rate": 0.00011747368421052631,
587
+ "logps/chosen": NaN,
588
+ "logps/rejected": NaN,
589
+ "loss": 0.8232,
590
+ "rewards/chosen": NaN,
591
+ "rewards/margins": NaN,
592
+ "rewards/rejected": NaN,
593
+ "step": 640
594
+ },
595
+ {
596
+ "epoch": 4.29,
597
+ "eval_kl": 0.0,
598
+ "eval_logps/chosen": -2241.91552734375,
599
+ "eval_logps/rejected": -1893.757080078125,
600
+ "eval_loss": 0.5,
601
+ "eval_rewards/chosen": -196.24290466308594,
602
+ "eval_rewards/margins": -30.684709548950195,
603
+ "eval_rewards/rejected": -165.55816650390625,
604
+ "eval_runtime": 169.9605,
605
+ "eval_samples_per_second": 2.059,
606
+ "eval_steps_per_second": 0.518,
607
+ "step": 650
608
+ },
609
+ {
610
+ "epoch": 4.36,
611
+ "grad_norm": 0.0,
612
+ "kl": 0.0,
613
+ "learning_rate": 0.00011466666666666667,
614
+ "logps/chosen": -2123.240234375,
615
+ "logps/rejected": NaN,
616
+ "loss": 0.6315,
617
+ "rewards/chosen": -188.09486389160156,
618
+ "rewards/margins": NaN,
619
+ "rewards/rejected": NaN,
620
+ "step": 660
621
+ },
622
+ {
623
+ "epoch": 4.49,
624
+ "grad_norm": 0.0,
625
+ "kl": 0.0,
626
+ "learning_rate": 0.00011185964912280702,
627
+ "logps/chosen": NaN,
628
+ "logps/rejected": NaN,
629
+ "loss": 0.7998,
630
+ "rewards/chosen": NaN,
631
+ "rewards/margins": NaN,
632
+ "rewards/rejected": NaN,
633
+ "step": 680
634
+ },
635
+ {
636
+ "epoch": 4.62,
637
+ "grad_norm": 0.0,
638
+ "kl": 0.0,
639
+ "learning_rate": 0.00010905263157894738,
640
+ "logps/chosen": NaN,
641
+ "logps/rejected": NaN,
642
+ "loss": 0.5881,
643
+ "rewards/chosen": NaN,
644
+ "rewards/margins": NaN,
645
+ "rewards/rejected": NaN,
646
+ "step": 700
647
+ },
648
+ {
649
+ "epoch": 4.62,
650
+ "eval_kl": 0.0,
651
+ "eval_logps/chosen": -2251.134033203125,
652
+ "eval_logps/rejected": -1901.2047119140625,
653
+ "eval_loss": 0.5,
654
+ "eval_rewards/chosen": -197.1647491455078,
655
+ "eval_rewards/margins": -30.8618221282959,
656
+ "eval_rewards/rejected": -166.30291748046875,
657
+ "eval_runtime": 169.9704,
658
+ "eval_samples_per_second": 2.059,
659
+ "eval_steps_per_second": 0.518,
660
+ "step": 700
661
+ },
662
+ {
663
+ "epoch": 4.75,
664
+ "grad_norm": 0.0,
665
+ "kl": 0.0,
666
+ "learning_rate": 0.00010624561403508772,
667
+ "logps/chosen": NaN,
668
+ "logps/rejected": NaN,
669
+ "loss": 0.8756,
670
+ "rewards/chosen": NaN,
671
+ "rewards/margins": NaN,
672
+ "rewards/rejected": NaN,
673
+ "step": 720
674
+ },
675
+ {
676
+ "epoch": 4.88,
677
+ "grad_norm": 0.0,
678
+ "kl": 0.0,
679
+ "learning_rate": 0.00010343859649122807,
680
+ "logps/chosen": NaN,
681
+ "logps/rejected": NaN,
682
+ "loss": 0.6156,
683
+ "rewards/chosen": NaN,
684
+ "rewards/margins": NaN,
685
+ "rewards/rejected": NaN,
686
+ "step": 740
687
+ },
688
+ {
689
+ "epoch": 4.95,
690
+ "eval_kl": 0.0,
691
+ "eval_logps/chosen": -2250.90234375,
692
+ "eval_logps/rejected": -1901.0179443359375,
693
+ "eval_loss": 0.5,
694
+ "eval_rewards/chosen": -197.1415557861328,
695
+ "eval_rewards/margins": -30.857322692871094,
696
+ "eval_rewards/rejected": -166.28424072265625,
697
+ "eval_runtime": 169.9616,
698
+ "eval_samples_per_second": 2.059,
699
+ "eval_steps_per_second": 0.518,
700
+ "step": 750
701
+ },
702
+ {
703
+ "epoch": 5.02,
704
+ "grad_norm": 0.0,
705
+ "kl": 0.0,
706
+ "learning_rate": 0.00010063157894736843,
707
+ "logps/chosen": NaN,
708
+ "logps/rejected": NaN,
709
+ "loss": 0.7376,
710
+ "rewards/chosen": NaN,
711
+ "rewards/margins": NaN,
712
+ "rewards/rejected": NaN,
713
+ "step": 760
714
+ },
715
+ {
716
+ "epoch": 5.15,
717
+ "grad_norm": 0.0,
718
+ "kl": 0.0,
719
+ "learning_rate": 9.782456140350877e-05,
720
+ "logps/chosen": NaN,
721
+ "logps/rejected": NaN,
722
+ "loss": 0.7998,
723
+ "rewards/chosen": NaN,
724
+ "rewards/margins": NaN,
725
+ "rewards/rejected": NaN,
726
+ "step": 780
727
+ },
728
+ {
729
+ "epoch": 5.28,
730
+ "grad_norm": 0.0,
731
+ "kl": 0.0,
732
+ "learning_rate": 9.501754385964913e-05,
733
+ "logps/chosen": NaN,
734
+ "logps/rejected": NaN,
735
+ "loss": 0.6291,
736
+ "rewards/chosen": NaN,
737
+ "rewards/margins": NaN,
738
+ "rewards/rejected": NaN,
739
+ "step": 800
740
+ },
741
+ {
742
+ "epoch": 5.28,
743
+ "eval_kl": 0.0,
744
+ "eval_logps/chosen": -2250.995849609375,
745
+ "eval_logps/rejected": -1901.1036376953125,
746
+ "eval_loss": 0.5,
747
+ "eval_rewards/chosen": -197.15087890625,
748
+ "eval_rewards/margins": -30.858049392700195,
749
+ "eval_rewards/rejected": -166.29283142089844,
750
+ "eval_runtime": 169.941,
751
+ "eval_samples_per_second": 2.06,
752
+ "eval_steps_per_second": 0.518,
753
+ "step": 800
754
+ },
755
+ {
756
+ "epoch": 5.41,
757
+ "grad_norm": 0.0,
758
+ "kl": 0.0,
759
+ "learning_rate": 9.221052631578948e-05,
760
+ "logps/chosen": NaN,
761
+ "logps/rejected": NaN,
762
+ "loss": 0.7167,
763
+ "rewards/chosen": NaN,
764
+ "rewards/margins": NaN,
765
+ "rewards/rejected": NaN,
766
+ "step": 820
767
+ },
768
+ {
769
+ "epoch": 5.54,
770
+ "grad_norm": 0.0,
771
+ "kl": 0.0,
772
+ "learning_rate": 8.940350877192983e-05,
773
+ "logps/chosen": NaN,
774
+ "logps/rejected": NaN,
775
+ "loss": 0.6285,
776
+ "rewards/chosen": NaN,
777
+ "rewards/margins": NaN,
778
+ "rewards/rejected": NaN,
779
+ "step": 840
780
+ },
781
+ {
782
+ "epoch": 5.61,
783
+ "eval_kl": 0.0,
784
+ "eval_logps/chosen": -2251.08837890625,
785
+ "eval_logps/rejected": -1901.1571044921875,
786
+ "eval_loss": 0.5,
787
+ "eval_rewards/chosen": -197.16017150878906,
788
+ "eval_rewards/margins": -30.86201286315918,
789
+ "eval_rewards/rejected": -166.2981719970703,
790
+ "eval_runtime": 169.9583,
791
+ "eval_samples_per_second": 2.059,
792
+ "eval_steps_per_second": 0.518,
793
+ "step": 850
794
+ },
795
+ {
796
+ "epoch": 5.68,
797
+ "grad_norm": 0.0,
798
+ "kl": 0.0,
799
+ "learning_rate": 8.659649122807018e-05,
800
+ "logps/chosen": NaN,
801
+ "logps/rejected": NaN,
802
+ "loss": 0.7898,
803
+ "rewards/chosen": NaN,
804
+ "rewards/margins": NaN,
805
+ "rewards/rejected": NaN,
806
+ "step": 860
807
+ },
808
+ {
809
+ "epoch": 5.81,
810
+ "grad_norm": 0.0,
811
+ "kl": 0.0,
812
+ "learning_rate": 8.378947368421053e-05,
813
+ "logps/chosen": NaN,
814
+ "logps/rejected": NaN,
815
+ "loss": 0.8174,
816
+ "rewards/chosen": NaN,
817
+ "rewards/margins": NaN,
818
+ "rewards/rejected": NaN,
819
+ "step": 880
820
+ },
821
+ {
822
+ "epoch": 5.94,
823
+ "grad_norm": 0.0,
824
+ "kl": 0.0,
825
+ "learning_rate": 8.098245614035088e-05,
826
+ "logps/chosen": NaN,
827
+ "logps/rejected": NaN,
828
+ "loss": 0.6918,
829
+ "rewards/chosen": NaN,
830
+ "rewards/margins": NaN,
831
+ "rewards/rejected": NaN,
832
+ "step": 900
833
+ },
834
+ {
835
+ "epoch": 5.94,
836
+ "eval_kl": 0.0,
837
+ "eval_logps/chosen": -2251.1103515625,
838
+ "eval_logps/rejected": -1901.1773681640625,
839
+ "eval_loss": 0.5,
840
+ "eval_rewards/chosen": -197.16233825683594,
841
+ "eval_rewards/margins": -30.86213493347168,
842
+ "eval_rewards/rejected": -166.30018615722656,
843
+ "eval_runtime": 170.0642,
844
+ "eval_samples_per_second": 2.058,
845
+ "eval_steps_per_second": 0.517,
846
+ "step": 900
847
+ },
848
+ {
849
+ "epoch": 6.07,
850
+ "grad_norm": 0.0,
851
+ "kl": 0.0,
852
+ "learning_rate": 7.817543859649124e-05,
853
+ "logps/chosen": NaN,
854
+ "logps/rejected": NaN,
855
+ "loss": 0.6965,
856
+ "rewards/chosen": NaN,
857
+ "rewards/margins": NaN,
858
+ "rewards/rejected": NaN,
859
+ "step": 920
860
+ },
861
+ {
862
+ "epoch": 6.2,
863
+ "grad_norm": 0.0,
864
+ "kl": 0.0,
865
+ "learning_rate": 7.536842105263158e-05,
866
+ "logps/chosen": NaN,
867
+ "logps/rejected": NaN,
868
+ "loss": 0.7869,
869
+ "rewards/chosen": NaN,
870
+ "rewards/margins": NaN,
871
+ "rewards/rejected": NaN,
872
+ "step": 940
873
+ },
874
+ {
875
+ "epoch": 6.27,
876
+ "eval_kl": 0.0,
877
+ "eval_logps/chosen": -2251.116943359375,
878
+ "eval_logps/rejected": -1901.21484375,
879
+ "eval_loss": 0.5,
880
+ "eval_rewards/chosen": -197.16302490234375,
881
+ "eval_rewards/margins": -30.85906982421875,
882
+ "eval_rewards/rejected": -166.303955078125,
883
+ "eval_runtime": 169.9373,
884
+ "eval_samples_per_second": 2.06,
885
+ "eval_steps_per_second": 0.518,
886
+ "step": 950
887
+ },
888
+ {
889
+ "epoch": 6.34,
890
+ "grad_norm": 0.0,
891
+ "kl": 0.0,
892
+ "learning_rate": 7.256140350877193e-05,
893
+ "logps/chosen": NaN,
894
+ "logps/rejected": NaN,
895
+ "loss": 0.6402,
896
+ "rewards/chosen": NaN,
897
+ "rewards/margins": NaN,
898
+ "rewards/rejected": NaN,
899
+ "step": 960
900
+ },
901
+ {
902
+ "epoch": 6.47,
903
+ "grad_norm": 0.0,
904
+ "kl": 0.0,
905
+ "learning_rate": 6.975438596491229e-05,
906
+ "logps/chosen": NaN,
907
+ "logps/rejected": NaN,
908
+ "loss": 0.8122,
909
+ "rewards/chosen": NaN,
910
+ "rewards/margins": NaN,
911
+ "rewards/rejected": NaN,
912
+ "step": 980
913
+ },
914
+ {
915
+ "epoch": 6.6,
916
+ "grad_norm": 0.0,
917
+ "kl": 0.0,
918
+ "learning_rate": 6.694736842105264e-05,
919
+ "logps/chosen": -2150.89111328125,
920
+ "logps/rejected": NaN,
921
+ "loss": 0.5483,
922
+ "rewards/chosen": -190.25399780273438,
923
+ "rewards/margins": NaN,
924
+ "rewards/rejected": NaN,
925
+ "step": 1000
926
+ },
927
+ {
928
+ "epoch": 6.6,
929
+ "eval_kl": 0.0,
930
+ "eval_logps/chosen": -2251.134521484375,
931
+ "eval_logps/rejected": -1901.1729736328125,
932
+ "eval_loss": 0.5,
933
+ "eval_rewards/chosen": -197.16481018066406,
934
+ "eval_rewards/margins": -30.865028381347656,
935
+ "eval_rewards/rejected": -166.29977416992188,
936
+ "eval_runtime": 169.9607,
937
+ "eval_samples_per_second": 2.059,
938
+ "eval_steps_per_second": 0.518,
939
+ "step": 1000
940
+ }
941
+ ],
942
+ "logging_steps": 20,
943
+ "max_steps": 1470,
944
+ "num_input_tokens_seen": 0,
945
+ "num_train_epochs": 10,
946
+ "save_steps": 100,
947
+ "total_flos": 0.0,
948
+ "train_batch_size": 4,
949
+ "trial_name": null,
950
+ "trial_params": null
951
+ }
checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5309801a19049c58de3649400afbb558334e14e33dff69ca022789cf2400ea
3
+ size 5688
checkpoint-1100/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: HuggingFaceH4/zephyr-7b-beta
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
checkpoint-1100/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "HuggingFaceH4/zephyr-7b-beta",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "up_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "down_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "lm_head",
31
+ "embed_tokens"
32
+ ],
33
+ "task_type": "CAUSAL_LM",
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
checkpoint-1100/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9a371c9c74dc6e7feb2a05b1dd39007dc16a5f35b76caab9ce2ac483ad7ee46
3
+ size 1134834064
checkpoint-1100/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aaf210564e50826b11ada69542b76a3ce19a4b9afc9b275dcce2cbf6af5c1b54
3
+ size 172772766
checkpoint-1100/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c94c375fe5ad2903d244ca6b5cc2a1a6cba4c0c26196f3b9cbd9ddd170bb0b8
3
+ size 14244
checkpoint-1100/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e174f59377592a49c6fe6f5bccb5b55023170f3201a81a5156b2afa97d5d99e
3
+ size 1064
checkpoint-1100/special_tokens_map.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": "<unk>",
22
+ "unk_token": {
23
+ "content": "<unk>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false
28
+ }
29
+ }
checkpoint-1100/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1100/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-1100/tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "legacy": true,
40
+ "max_lenght": 8192,
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "<unk>",
43
+ "padding": true,
44
+ "sp_model_kwargs": {},
45
+ "spaces_between_special_tokens": false,
46
+ "tokenizer_class": "LlamaTokenizer",
47
+ "truncation_side": "left",
48
+ "unk_token": "<unk>",
49
+ "use_default_system_prompt": true
50
+ }
checkpoint-1100/trainer_state.json ADDED
@@ -0,0 +1,1044 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5,
3
+ "best_model_checkpoint": "./zephyr/10-04-24-Weni-WeniGPT-Agents-Zephyr-1.0.25-KTO_Experiment with a new tokenizer configuration for chat template of zephyr-2_max_steps-1470_batch_16_2024-04-10_ppid_9/checkpoint-300",
4
+ "epoch": 7.260726072607261,
5
+ "eval_steps": 50,
6
+ "global_step": 1100,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.13,
13
+ "grad_norm": 57.293792724609375,
14
+ "kl": 0.03853478282690048,
15
+ "learning_rate": 6.222222222222222e-05,
16
+ "logps/chosen": NaN,
17
+ "logps/rejected": NaN,
18
+ "loss": 0.7078,
19
+ "rewards/chosen": NaN,
20
+ "rewards/margins": NaN,
21
+ "rewards/rejected": NaN,
22
+ "step": 20
23
+ },
24
+ {
25
+ "epoch": 0.26,
26
+ "grad_norm": 112.50944519042969,
27
+ "kl": 3.2648494243621826,
28
+ "learning_rate": 0.00014666666666666666,
29
+ "logps/chosen": NaN,
30
+ "logps/rejected": NaN,
31
+ "loss": 0.6966,
32
+ "rewards/chosen": NaN,
33
+ "rewards/margins": NaN,
34
+ "rewards/rejected": NaN,
35
+ "step": 40
36
+ },
37
+ {
38
+ "epoch": 0.33,
39
+ "eval_kl": 0.0,
40
+ "eval_logps/chosen": -413.6161193847656,
41
+ "eval_logps/rejected": -362.2559509277344,
42
+ "eval_loss": 0.5063381791114807,
43
+ "eval_rewards/chosen": -13.412939071655273,
44
+ "eval_rewards/margins": -1.0048810243606567,
45
+ "eval_rewards/rejected": -12.408059120178223,
46
+ "eval_runtime": 170.1826,
47
+ "eval_samples_per_second": 2.057,
48
+ "eval_steps_per_second": 0.517,
49
+ "step": 50
50
+ },
51
+ {
52
+ "epoch": 0.4,
53
+ "grad_norm": 19.94582748413086,
54
+ "kl": 0.45922356843948364,
55
+ "learning_rate": 0.00019887719298245616,
56
+ "logps/chosen": NaN,
57
+ "logps/rejected": NaN,
58
+ "loss": 0.5743,
59
+ "rewards/chosen": NaN,
60
+ "rewards/margins": NaN,
61
+ "rewards/rejected": NaN,
62
+ "step": 60
63
+ },
64
+ {
65
+ "epoch": 0.53,
66
+ "grad_norm": 79.92957305908203,
67
+ "kl": 0.0,
68
+ "learning_rate": 0.0001960701754385965,
69
+ "logps/chosen": NaN,
70
+ "logps/rejected": NaN,
71
+ "loss": 0.6108,
72
+ "rewards/chosen": NaN,
73
+ "rewards/margins": NaN,
74
+ "rewards/rejected": NaN,
75
+ "step": 80
76
+ },
77
+ {
78
+ "epoch": 0.66,
79
+ "grad_norm": 0.06103940308094025,
80
+ "kl": 0.0,
81
+ "learning_rate": 0.00019326315789473686,
82
+ "logps/chosen": NaN,
83
+ "logps/rejected": NaN,
84
+ "loss": 0.754,
85
+ "rewards/chosen": NaN,
86
+ "rewards/margins": NaN,
87
+ "rewards/rejected": NaN,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.66,
92
+ "eval_kl": 0.0,
93
+ "eval_logps/chosen": -2027.0018310546875,
94
+ "eval_logps/rejected": -1697.82177734375,
95
+ "eval_loss": 0.5000000596046448,
96
+ "eval_rewards/chosen": -174.75149536132812,
97
+ "eval_rewards/margins": -28.786863327026367,
98
+ "eval_rewards/rejected": -145.96463012695312,
99
+ "eval_runtime": 170.0562,
100
+ "eval_samples_per_second": 2.058,
101
+ "eval_steps_per_second": 0.517,
102
+ "step": 100
103
+ },
104
+ {
105
+ "epoch": 0.79,
106
+ "grad_norm": 0.0,
107
+ "kl": 0.0,
108
+ "learning_rate": 0.0001904561403508772,
109
+ "logps/chosen": NaN,
110
+ "logps/rejected": NaN,
111
+ "loss": 0.95,
112
+ "rewards/chosen": NaN,
113
+ "rewards/margins": NaN,
114
+ "rewards/rejected": NaN,
115
+ "step": 120
116
+ },
117
+ {
118
+ "epoch": 0.92,
119
+ "grad_norm": 0.0,
120
+ "kl": 0.0,
121
+ "learning_rate": 0.00018764912280701756,
122
+ "logps/chosen": NaN,
123
+ "logps/rejected": NaN,
124
+ "loss": 0.6274,
125
+ "rewards/chosen": NaN,
126
+ "rewards/margins": NaN,
127
+ "rewards/rejected": NaN,
128
+ "step": 140
129
+ },
130
+ {
131
+ "epoch": 0.99,
132
+ "eval_kl": 0.0,
133
+ "eval_logps/chosen": -2237.81494140625,
134
+ "eval_logps/rejected": -1889.774169921875,
135
+ "eval_loss": 0.5,
136
+ "eval_rewards/chosen": -195.83285522460938,
137
+ "eval_rewards/margins": -30.672954559326172,
138
+ "eval_rewards/rejected": -165.15989685058594,
139
+ "eval_runtime": 169.8795,
140
+ "eval_samples_per_second": 2.06,
141
+ "eval_steps_per_second": 0.518,
142
+ "step": 150
143
+ },
144
+ {
145
+ "epoch": 1.06,
146
+ "grad_norm": 0.0,
147
+ "kl": 0.0,
148
+ "learning_rate": 0.0001848421052631579,
149
+ "logps/chosen": NaN,
150
+ "logps/rejected": NaN,
151
+ "loss": 0.6387,
152
+ "rewards/chosen": NaN,
153
+ "rewards/margins": NaN,
154
+ "rewards/rejected": NaN,
155
+ "step": 160
156
+ },
157
+ {
158
+ "epoch": 1.19,
159
+ "grad_norm": 0.0,
160
+ "kl": 0.0,
161
+ "learning_rate": 0.00018203508771929826,
162
+ "logps/chosen": NaN,
163
+ "logps/rejected": NaN,
164
+ "loss": 0.8327,
165
+ "rewards/chosen": NaN,
166
+ "rewards/margins": NaN,
167
+ "rewards/rejected": NaN,
168
+ "step": 180
169
+ },
170
+ {
171
+ "epoch": 1.32,
172
+ "grad_norm": 0.0,
173
+ "kl": 0.0,
174
+ "learning_rate": 0.00017922807017543862,
175
+ "logps/chosen": NaN,
176
+ "logps/rejected": NaN,
177
+ "loss": 0.642,
178
+ "rewards/chosen": NaN,
179
+ "rewards/margins": NaN,
180
+ "rewards/rejected": NaN,
181
+ "step": 200
182
+ },
183
+ {
184
+ "epoch": 1.32,
185
+ "eval_kl": 0.0,
186
+ "eval_logps/chosen": -2230.916259765625,
187
+ "eval_logps/rejected": -1884.9520263671875,
188
+ "eval_loss": 0.5000000596046448,
189
+ "eval_rewards/chosen": -195.14297485351562,
190
+ "eval_rewards/margins": -30.465293884277344,
191
+ "eval_rewards/rejected": -164.67767333984375,
192
+ "eval_runtime": 170.1489,
193
+ "eval_samples_per_second": 2.057,
194
+ "eval_steps_per_second": 0.517,
195
+ "step": 200
196
+ },
197
+ {
198
+ "epoch": 1.45,
199
+ "grad_norm": 0.0,
200
+ "kl": 0.0,
201
+ "learning_rate": 0.00017642105263157896,
202
+ "logps/chosen": NaN,
203
+ "logps/rejected": NaN,
204
+ "loss": 0.7493,
205
+ "rewards/chosen": NaN,
206
+ "rewards/margins": NaN,
207
+ "rewards/rejected": NaN,
208
+ "step": 220
209
+ },
210
+ {
211
+ "epoch": 1.58,
212
+ "grad_norm": 0.0,
213
+ "kl": 0.0,
214
+ "learning_rate": 0.0001736140350877193,
215
+ "logps/chosen": NaN,
216
+ "logps/rejected": NaN,
217
+ "loss": 0.6241,
218
+ "rewards/chosen": NaN,
219
+ "rewards/margins": NaN,
220
+ "rewards/rejected": NaN,
221
+ "step": 240
222
+ },
223
+ {
224
+ "epoch": 1.65,
225
+ "eval_kl": 0.0,
226
+ "eval_logps/chosen": -2230.957275390625,
227
+ "eval_logps/rejected": -1885.0225830078125,
228
+ "eval_loss": 0.5000000596046448,
229
+ "eval_rewards/chosen": -195.14706420898438,
230
+ "eval_rewards/margins": -30.462318420410156,
231
+ "eval_rewards/rejected": -164.68475341796875,
232
+ "eval_runtime": 170.1092,
233
+ "eval_samples_per_second": 2.058,
234
+ "eval_steps_per_second": 0.517,
235
+ "step": 250
236
+ },
237
+ {
238
+ "epoch": 1.72,
239
+ "grad_norm": 0.0,
240
+ "kl": 0.0,
241
+ "learning_rate": 0.00017080701754385965,
242
+ "logps/chosen": NaN,
243
+ "logps/rejected": NaN,
244
+ "loss": 0.9621,
245
+ "rewards/chosen": NaN,
246
+ "rewards/margins": NaN,
247
+ "rewards/rejected": NaN,
248
+ "step": 260
249
+ },
250
+ {
251
+ "epoch": 1.85,
252
+ "grad_norm": 0.0,
253
+ "kl": 0.0,
254
+ "learning_rate": 0.000168,
255
+ "logps/chosen": NaN,
256
+ "logps/rejected": NaN,
257
+ "loss": 0.7279,
258
+ "rewards/chosen": NaN,
259
+ "rewards/margins": NaN,
260
+ "rewards/rejected": NaN,
261
+ "step": 280
262
+ },
263
+ {
264
+ "epoch": 1.98,
265
+ "grad_norm": 0.0,
266
+ "kl": 0.0,
267
+ "learning_rate": 0.00016519298245614035,
268
+ "logps/chosen": NaN,
269
+ "logps/rejected": NaN,
270
+ "loss": 0.7477,
271
+ "rewards/chosen": NaN,
272
+ "rewards/margins": NaN,
273
+ "rewards/rejected": NaN,
274
+ "step": 300
275
+ },
276
+ {
277
+ "epoch": 1.98,
278
+ "eval_kl": 0.0,
279
+ "eval_logps/chosen": -2238.164306640625,
280
+ "eval_logps/rejected": -1890.7996826171875,
281
+ "eval_loss": 0.5,
282
+ "eval_rewards/chosen": -195.86773681640625,
283
+ "eval_rewards/margins": -30.605329513549805,
284
+ "eval_rewards/rejected": -165.26242065429688,
285
+ "eval_runtime": 170.0647,
286
+ "eval_samples_per_second": 2.058,
287
+ "eval_steps_per_second": 0.517,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 2.11,
292
+ "grad_norm": 0.0,
293
+ "kl": 0.0,
294
+ "learning_rate": 0.00016238596491228072,
295
+ "logps/chosen": NaN,
296
+ "logps/rejected": NaN,
297
+ "loss": 0.7111,
298
+ "rewards/chosen": NaN,
299
+ "rewards/margins": NaN,
300
+ "rewards/rejected": NaN,
301
+ "step": 320
302
+ },
303
+ {
304
+ "epoch": 2.24,
305
+ "grad_norm": 0.0,
306
+ "kl": 0.0,
307
+ "learning_rate": 0.00015957894736842105,
308
+ "logps/chosen": NaN,
309
+ "logps/rejected": NaN,
310
+ "loss": 0.8685,
311
+ "rewards/chosen": NaN,
312
+ "rewards/margins": NaN,
313
+ "rewards/rejected": NaN,
314
+ "step": 340
315
+ },
316
+ {
317
+ "epoch": 2.31,
318
+ "eval_kl": 0.0,
319
+ "eval_logps/chosen": -2238.054931640625,
320
+ "eval_logps/rejected": -1890.694580078125,
321
+ "eval_loss": 0.5,
322
+ "eval_rewards/chosen": -195.85682678222656,
323
+ "eval_rewards/margins": -30.604921340942383,
324
+ "eval_rewards/rejected": -165.2519073486328,
325
+ "eval_runtime": 170.0528,
326
+ "eval_samples_per_second": 2.058,
327
+ "eval_steps_per_second": 0.517,
328
+ "step": 350
329
+ },
330
+ {
331
+ "epoch": 2.38,
332
+ "grad_norm": 0.0,
333
+ "kl": 0.0,
334
+ "learning_rate": 0.00015677192982456142,
335
+ "logps/chosen": NaN,
336
+ "logps/rejected": NaN,
337
+ "loss": 0.6905,
338
+ "rewards/chosen": NaN,
339
+ "rewards/margins": NaN,
340
+ "rewards/rejected": NaN,
341
+ "step": 360
342
+ },
343
+ {
344
+ "epoch": 2.51,
345
+ "grad_norm": 0.0,
346
+ "kl": 0.0,
347
+ "learning_rate": 0.00015396491228070175,
348
+ "logps/chosen": NaN,
349
+ "logps/rejected": NaN,
350
+ "loss": 0.736,
351
+ "rewards/chosen": NaN,
352
+ "rewards/margins": NaN,
353
+ "rewards/rejected": NaN,
354
+ "step": 380
355
+ },
356
+ {
357
+ "epoch": 2.64,
358
+ "grad_norm": 0.0,
359
+ "kl": 0.0,
360
+ "learning_rate": 0.00015115789473684211,
361
+ "logps/chosen": NaN,
362
+ "logps/rejected": NaN,
363
+ "loss": 0.693,
364
+ "rewards/chosen": NaN,
365
+ "rewards/margins": NaN,
366
+ "rewards/rejected": NaN,
367
+ "step": 400
368
+ },
369
+ {
370
+ "epoch": 2.64,
371
+ "eval_kl": 0.0,
372
+ "eval_logps/chosen": -2237.827392578125,
373
+ "eval_logps/rejected": -1890.5028076171875,
374
+ "eval_loss": 0.5,
375
+ "eval_rewards/chosen": -195.83407592773438,
376
+ "eval_rewards/margins": -30.601318359375,
377
+ "eval_rewards/rejected": -165.23275756835938,
378
+ "eval_runtime": 170.2445,
379
+ "eval_samples_per_second": 2.056,
380
+ "eval_steps_per_second": 0.517,
381
+ "step": 400
382
+ },
383
+ {
384
+ "epoch": 2.77,
385
+ "grad_norm": 8.788210266175156e-07,
386
+ "kl": 0.0,
387
+ "learning_rate": 0.00014835087719298245,
388
+ "logps/chosen": NaN,
389
+ "logps/rejected": NaN,
390
+ "loss": 0.8652,
391
+ "rewards/chosen": NaN,
392
+ "rewards/margins": NaN,
393
+ "rewards/rejected": NaN,
394
+ "step": 420
395
+ },
396
+ {
397
+ "epoch": 2.9,
398
+ "grad_norm": 0.0,
399
+ "kl": 0.0,
400
+ "learning_rate": 0.0001455438596491228,
401
+ "logps/chosen": NaN,
402
+ "logps/rejected": NaN,
403
+ "loss": 0.686,
404
+ "rewards/chosen": NaN,
405
+ "rewards/margins": NaN,
406
+ "rewards/rejected": NaN,
407
+ "step": 440
408
+ },
409
+ {
410
+ "epoch": 2.97,
411
+ "eval_kl": 0.0,
412
+ "eval_logps/chosen": -2237.722412109375,
413
+ "eval_logps/rejected": -1890.4027099609375,
414
+ "eval_loss": 0.5,
415
+ "eval_rewards/chosen": -195.82354736328125,
416
+ "eval_rewards/margins": -30.600812911987305,
417
+ "eval_rewards/rejected": -165.22274780273438,
418
+ "eval_runtime": 170.3429,
419
+ "eval_samples_per_second": 2.055,
420
+ "eval_steps_per_second": 0.517,
421
+ "step": 450
422
+ },
423
+ {
424
+ "epoch": 3.04,
425
+ "grad_norm": 0.0,
426
+ "kl": 0.0,
427
+ "learning_rate": 0.00014273684210526318,
428
+ "logps/chosen": NaN,
429
+ "logps/rejected": NaN,
430
+ "loss": 0.6858,
431
+ "rewards/chosen": NaN,
432
+ "rewards/margins": NaN,
433
+ "rewards/rejected": NaN,
434
+ "step": 460
435
+ },
436
+ {
437
+ "epoch": 3.17,
438
+ "grad_norm": 0.0,
439
+ "kl": 0.0,
440
+ "learning_rate": 0.0001399298245614035,
441
+ "logps/chosen": NaN,
442
+ "logps/rejected": NaN,
443
+ "loss": 0.8479,
444
+ "rewards/chosen": NaN,
445
+ "rewards/margins": NaN,
446
+ "rewards/rejected": NaN,
447
+ "step": 480
448
+ },
449
+ {
450
+ "epoch": 3.3,
451
+ "grad_norm": 0.0,
452
+ "kl": 0.0,
453
+ "learning_rate": 0.00013712280701754388,
454
+ "logps/chosen": NaN,
455
+ "logps/rejected": NaN,
456
+ "loss": 0.6119,
457
+ "rewards/chosen": NaN,
458
+ "rewards/margins": NaN,
459
+ "rewards/rejected": NaN,
460
+ "step": 500
461
+ },
462
+ {
463
+ "epoch": 3.3,
464
+ "eval_kl": 0.0,
465
+ "eval_logps/chosen": -2237.6083984375,
466
+ "eval_logps/rejected": -1890.3140869140625,
467
+ "eval_loss": 0.5,
468
+ "eval_rewards/chosen": -195.81216430664062,
469
+ "eval_rewards/margins": -30.598268508911133,
470
+ "eval_rewards/rejected": -165.21388244628906,
471
+ "eval_runtime": 169.9488,
472
+ "eval_samples_per_second": 2.059,
473
+ "eval_steps_per_second": 0.518,
474
+ "step": 500
475
+ },
476
+ {
477
+ "epoch": 3.43,
478
+ "grad_norm": 0.0,
479
+ "kl": 0.0,
480
+ "learning_rate": 0.0001343157894736842,
481
+ "logps/chosen": NaN,
482
+ "logps/rejected": NaN,
483
+ "loss": 0.7107,
484
+ "rewards/chosen": NaN,
485
+ "rewards/margins": NaN,
486
+ "rewards/rejected": NaN,
487
+ "step": 520
488
+ },
489
+ {
490
+ "epoch": 3.56,
491
+ "grad_norm": 0.0,
492
+ "kl": 0.0,
493
+ "learning_rate": 0.00013150877192982455,
494
+ "logps/chosen": NaN,
495
+ "logps/rejected": NaN,
496
+ "loss": 0.5902,
497
+ "rewards/chosen": NaN,
498
+ "rewards/margins": NaN,
499
+ "rewards/rejected": NaN,
500
+ "step": 540
501
+ },
502
+ {
503
+ "epoch": 3.63,
504
+ "eval_kl": 0.0,
505
+ "eval_logps/chosen": -2237.56494140625,
506
+ "eval_logps/rejected": -1890.3043212890625,
507
+ "eval_loss": 0.5,
508
+ "eval_rewards/chosen": -195.80784606933594,
509
+ "eval_rewards/margins": -30.59491539001465,
510
+ "eval_rewards/rejected": -165.21290588378906,
511
+ "eval_runtime": 169.9756,
512
+ "eval_samples_per_second": 2.059,
513
+ "eval_steps_per_second": 0.518,
514
+ "step": 550
515
+ },
516
+ {
517
+ "epoch": 3.7,
518
+ "grad_norm": 0.0,
519
+ "kl": 0.0,
520
+ "learning_rate": 0.0001287017543859649,
521
+ "logps/chosen": NaN,
522
+ "logps/rejected": NaN,
523
+ "loss": 0.9042,
524
+ "rewards/chosen": NaN,
525
+ "rewards/margins": NaN,
526
+ "rewards/rejected": NaN,
527
+ "step": 560
528
+ },
529
+ {
530
+ "epoch": 3.83,
531
+ "grad_norm": 0.0,
532
+ "kl": 0.0,
533
+ "learning_rate": 0.00012589473684210527,
534
+ "logps/chosen": NaN,
535
+ "logps/rejected": NaN,
536
+ "loss": 0.7268,
537
+ "rewards/chosen": NaN,
538
+ "rewards/margins": NaN,
539
+ "rewards/rejected": NaN,
540
+ "step": 580
541
+ },
542
+ {
543
+ "epoch": 3.96,
544
+ "grad_norm": 0.0,
545
+ "kl": 0.0,
546
+ "learning_rate": 0.00012308771929824564,
547
+ "logps/chosen": NaN,
548
+ "logps/rejected": NaN,
549
+ "loss": 0.7106,
550
+ "rewards/chosen": NaN,
551
+ "rewards/margins": NaN,
552
+ "rewards/rejected": NaN,
553
+ "step": 600
554
+ },
555
+ {
556
+ "epoch": 3.96,
557
+ "eval_kl": 0.0,
558
+ "eval_logps/chosen": -2241.97509765625,
559
+ "eval_logps/rejected": -1893.87646484375,
560
+ "eval_loss": 0.5,
561
+ "eval_rewards/chosen": -196.24884033203125,
562
+ "eval_rewards/margins": -30.67871856689453,
563
+ "eval_rewards/rejected": -165.57012939453125,
564
+ "eval_runtime": 169.9427,
565
+ "eval_samples_per_second": 2.06,
566
+ "eval_steps_per_second": 0.518,
567
+ "step": 600
568
+ },
569
+ {
570
+ "epoch": 4.09,
571
+ "grad_norm": 0.0,
572
+ "kl": 0.0,
573
+ "learning_rate": 0.00012028070175438597,
574
+ "logps/chosen": NaN,
575
+ "logps/rejected": NaN,
576
+ "loss": 0.6829,
577
+ "rewards/chosen": NaN,
578
+ "rewards/margins": NaN,
579
+ "rewards/rejected": NaN,
580
+ "step": 620
581
+ },
582
+ {
583
+ "epoch": 4.22,
584
+ "grad_norm": 0.0,
585
+ "kl": 0.0,
586
+ "learning_rate": 0.00011747368421052631,
587
+ "logps/chosen": NaN,
588
+ "logps/rejected": NaN,
589
+ "loss": 0.8232,
590
+ "rewards/chosen": NaN,
591
+ "rewards/margins": NaN,
592
+ "rewards/rejected": NaN,
593
+ "step": 640
594
+ },
595
+ {
596
+ "epoch": 4.29,
597
+ "eval_kl": 0.0,
598
+ "eval_logps/chosen": -2241.91552734375,
599
+ "eval_logps/rejected": -1893.757080078125,
600
+ "eval_loss": 0.5,
601
+ "eval_rewards/chosen": -196.24290466308594,
602
+ "eval_rewards/margins": -30.684709548950195,
603
+ "eval_rewards/rejected": -165.55816650390625,
604
+ "eval_runtime": 169.9605,
605
+ "eval_samples_per_second": 2.059,
606
+ "eval_steps_per_second": 0.518,
607
+ "step": 650
608
+ },
609
+ {
610
+ "epoch": 4.36,
611
+ "grad_norm": 0.0,
612
+ "kl": 0.0,
613
+ "learning_rate": 0.00011466666666666667,
614
+ "logps/chosen": -2123.240234375,
615
+ "logps/rejected": NaN,
616
+ "loss": 0.6315,
617
+ "rewards/chosen": -188.09486389160156,
618
+ "rewards/margins": NaN,
619
+ "rewards/rejected": NaN,
620
+ "step": 660
621
+ },
622
+ {
623
+ "epoch": 4.49,
624
+ "grad_norm": 0.0,
625
+ "kl": 0.0,
626
+ "learning_rate": 0.00011185964912280702,
627
+ "logps/chosen": NaN,
628
+ "logps/rejected": NaN,
629
+ "loss": 0.7998,
630
+ "rewards/chosen": NaN,
631
+ "rewards/margins": NaN,
632
+ "rewards/rejected": NaN,
633
+ "step": 680
634
+ },
635
+ {
636
+ "epoch": 4.62,
637
+ "grad_norm": 0.0,
638
+ "kl": 0.0,
639
+ "learning_rate": 0.00010905263157894738,
640
+ "logps/chosen": NaN,
641
+ "logps/rejected": NaN,
642
+ "loss": 0.5881,
643
+ "rewards/chosen": NaN,
644
+ "rewards/margins": NaN,
645
+ "rewards/rejected": NaN,
646
+ "step": 700
647
+ },
648
+ {
649
+ "epoch": 4.62,
650
+ "eval_kl": 0.0,
651
+ "eval_logps/chosen": -2251.134033203125,
652
+ "eval_logps/rejected": -1901.2047119140625,
653
+ "eval_loss": 0.5,
654
+ "eval_rewards/chosen": -197.1647491455078,
655
+ "eval_rewards/margins": -30.8618221282959,
656
+ "eval_rewards/rejected": -166.30291748046875,
657
+ "eval_runtime": 169.9704,
658
+ "eval_samples_per_second": 2.059,
659
+ "eval_steps_per_second": 0.518,
660
+ "step": 700
661
+ },
662
+ {
663
+ "epoch": 4.75,
664
+ "grad_norm": 0.0,
665
+ "kl": 0.0,
666
+ "learning_rate": 0.00010624561403508772,
667
+ "logps/chosen": NaN,
668
+ "logps/rejected": NaN,
669
+ "loss": 0.8756,
670
+ "rewards/chosen": NaN,
671
+ "rewards/margins": NaN,
672
+ "rewards/rejected": NaN,
673
+ "step": 720
674
+ },
675
+ {
676
+ "epoch": 4.88,
677
+ "grad_norm": 0.0,
678
+ "kl": 0.0,
679
+ "learning_rate": 0.00010343859649122807,
680
+ "logps/chosen": NaN,
681
+ "logps/rejected": NaN,
682
+ "loss": 0.6156,
683
+ "rewards/chosen": NaN,
684
+ "rewards/margins": NaN,
685
+ "rewards/rejected": NaN,
686
+ "step": 740
687
+ },
688
+ {
689
+ "epoch": 4.95,
690
+ "eval_kl": 0.0,
691
+ "eval_logps/chosen": -2250.90234375,
692
+ "eval_logps/rejected": -1901.0179443359375,
693
+ "eval_loss": 0.5,
694
+ "eval_rewards/chosen": -197.1415557861328,
695
+ "eval_rewards/margins": -30.857322692871094,
696
+ "eval_rewards/rejected": -166.28424072265625,
697
+ "eval_runtime": 169.9616,
698
+ "eval_samples_per_second": 2.059,
699
+ "eval_steps_per_second": 0.518,
700
+ "step": 750
701
+ },
702
+ {
703
+ "epoch": 5.02,
704
+ "grad_norm": 0.0,
705
+ "kl": 0.0,
706
+ "learning_rate": 0.00010063157894736843,
707
+ "logps/chosen": NaN,
708
+ "logps/rejected": NaN,
709
+ "loss": 0.7376,
710
+ "rewards/chosen": NaN,
711
+ "rewards/margins": NaN,
712
+ "rewards/rejected": NaN,
713
+ "step": 760
714
+ },
715
+ {
716
+ "epoch": 5.15,
717
+ "grad_norm": 0.0,
718
+ "kl": 0.0,
719
+ "learning_rate": 9.782456140350877e-05,
720
+ "logps/chosen": NaN,
721
+ "logps/rejected": NaN,
722
+ "loss": 0.7998,
723
+ "rewards/chosen": NaN,
724
+ "rewards/margins": NaN,
725
+ "rewards/rejected": NaN,
726
+ "step": 780
727
+ },
728
+ {
729
+ "epoch": 5.28,
730
+ "grad_norm": 0.0,
731
+ "kl": 0.0,
732
+ "learning_rate": 9.501754385964913e-05,
733
+ "logps/chosen": NaN,
734
+ "logps/rejected": NaN,
735
+ "loss": 0.6291,
736
+ "rewards/chosen": NaN,
737
+ "rewards/margins": NaN,
738
+ "rewards/rejected": NaN,
739
+ "step": 800
740
+ },
741
+ {
742
+ "epoch": 5.28,
743
+ "eval_kl": 0.0,
744
+ "eval_logps/chosen": -2250.995849609375,
745
+ "eval_logps/rejected": -1901.1036376953125,
746
+ "eval_loss": 0.5,
747
+ "eval_rewards/chosen": -197.15087890625,
748
+ "eval_rewards/margins": -30.858049392700195,
749
+ "eval_rewards/rejected": -166.29283142089844,
750
+ "eval_runtime": 169.941,
751
+ "eval_samples_per_second": 2.06,
752
+ "eval_steps_per_second": 0.518,
753
+ "step": 800
754
+ },
755
+ {
756
+ "epoch": 5.41,
757
+ "grad_norm": 0.0,
758
+ "kl": 0.0,
759
+ "learning_rate": 9.221052631578948e-05,
760
+ "logps/chosen": NaN,
761
+ "logps/rejected": NaN,
762
+ "loss": 0.7167,
763
+ "rewards/chosen": NaN,
764
+ "rewards/margins": NaN,
765
+ "rewards/rejected": NaN,
766
+ "step": 820
767
+ },
768
+ {
769
+ "epoch": 5.54,
770
+ "grad_norm": 0.0,
771
+ "kl": 0.0,
772
+ "learning_rate": 8.940350877192983e-05,
773
+ "logps/chosen": NaN,
774
+ "logps/rejected": NaN,
775
+ "loss": 0.6285,
776
+ "rewards/chosen": NaN,
777
+ "rewards/margins": NaN,
778
+ "rewards/rejected": NaN,
779
+ "step": 840
780
+ },
781
+ {
782
+ "epoch": 5.61,
783
+ "eval_kl": 0.0,
784
+ "eval_logps/chosen": -2251.08837890625,
785
+ "eval_logps/rejected": -1901.1571044921875,
786
+ "eval_loss": 0.5,
787
+ "eval_rewards/chosen": -197.16017150878906,
788
+ "eval_rewards/margins": -30.86201286315918,
789
+ "eval_rewards/rejected": -166.2981719970703,
790
+ "eval_runtime": 169.9583,
791
+ "eval_samples_per_second": 2.059,
792
+ "eval_steps_per_second": 0.518,
793
+ "step": 850
794
+ },
795
+ {
796
+ "epoch": 5.68,
797
+ "grad_norm": 0.0,
798
+ "kl": 0.0,
799
+ "learning_rate": 8.659649122807018e-05,
800
+ "logps/chosen": NaN,
801
+ "logps/rejected": NaN,
802
+ "loss": 0.7898,
803
+ "rewards/chosen": NaN,
804
+ "rewards/margins": NaN,
805
+ "rewards/rejected": NaN,
806
+ "step": 860
807
+ },
808
+ {
809
+ "epoch": 5.81,
810
+ "grad_norm": 0.0,
811
+ "kl": 0.0,
812
+ "learning_rate": 8.378947368421053e-05,
813
+ "logps/chosen": NaN,
814
+ "logps/rejected": NaN,
815
+ "loss": 0.8174,
816
+ "rewards/chosen": NaN,
817
+ "rewards/margins": NaN,
818
+ "rewards/rejected": NaN,
819
+ "step": 880
820
+ },
821
+ {
822
+ "epoch": 5.94,
823
+ "grad_norm": 0.0,
824
+ "kl": 0.0,
825
+ "learning_rate": 8.098245614035088e-05,
826
+ "logps/chosen": NaN,
827
+ "logps/rejected": NaN,
828
+ "loss": 0.6918,
829
+ "rewards/chosen": NaN,
830
+ "rewards/margins": NaN,
831
+ "rewards/rejected": NaN,
832
+ "step": 900
833
+ },
834
+ {
835
+ "epoch": 5.94,
836
+ "eval_kl": 0.0,
837
+ "eval_logps/chosen": -2251.1103515625,
838
+ "eval_logps/rejected": -1901.1773681640625,
839
+ "eval_loss": 0.5,
840
+ "eval_rewards/chosen": -197.16233825683594,
841
+ "eval_rewards/margins": -30.86213493347168,
842
+ "eval_rewards/rejected": -166.30018615722656,
843
+ "eval_runtime": 170.0642,
844
+ "eval_samples_per_second": 2.058,
845
+ "eval_steps_per_second": 0.517,
846
+ "step": 900
847
+ },
848
+ {
849
+ "epoch": 6.07,
850
+ "grad_norm": 0.0,
851
+ "kl": 0.0,
852
+ "learning_rate": 7.817543859649124e-05,
853
+ "logps/chosen": NaN,
854
+ "logps/rejected": NaN,
855
+ "loss": 0.6965,
856
+ "rewards/chosen": NaN,
857
+ "rewards/margins": NaN,
858
+ "rewards/rejected": NaN,
859
+ "step": 920
860
+ },
861
+ {
862
+ "epoch": 6.2,
863
+ "grad_norm": 0.0,
864
+ "kl": 0.0,
865
+ "learning_rate": 7.536842105263158e-05,
866
+ "logps/chosen": NaN,
867
+ "logps/rejected": NaN,
868
+ "loss": 0.7869,
869
+ "rewards/chosen": NaN,
870
+ "rewards/margins": NaN,
871
+ "rewards/rejected": NaN,
872
+ "step": 940
873
+ },
874
+ {
875
+ "epoch": 6.27,
876
+ "eval_kl": 0.0,
877
+ "eval_logps/chosen": -2251.116943359375,
878
+ "eval_logps/rejected": -1901.21484375,
879
+ "eval_loss": 0.5,
880
+ "eval_rewards/chosen": -197.16302490234375,
881
+ "eval_rewards/margins": -30.85906982421875,
882
+ "eval_rewards/rejected": -166.303955078125,
883
+ "eval_runtime": 169.9373,
884
+ "eval_samples_per_second": 2.06,
885
+ "eval_steps_per_second": 0.518,
886
+ "step": 950
887
+ },
888
+ {
889
+ "epoch": 6.34,
890
+ "grad_norm": 0.0,
891
+ "kl": 0.0,
892
+ "learning_rate": 7.256140350877193e-05,
893
+ "logps/chosen": NaN,
894
+ "logps/rejected": NaN,
895
+ "loss": 0.6402,
896
+ "rewards/chosen": NaN,
897
+ "rewards/margins": NaN,
898
+ "rewards/rejected": NaN,
899
+ "step": 960
900
+ },
901
+ {
902
+ "epoch": 6.47,
903
+ "grad_norm": 0.0,
904
+ "kl": 0.0,
905
+ "learning_rate": 6.975438596491229e-05,
906
+ "logps/chosen": NaN,
907
+ "logps/rejected": NaN,
908
+ "loss": 0.8122,
909
+ "rewards/chosen": NaN,
910
+ "rewards/margins": NaN,
911
+ "rewards/rejected": NaN,
912
+ "step": 980
913
+ },
914
+ {
915
+ "epoch": 6.6,
916
+ "grad_norm": 0.0,
917
+ "kl": 0.0,
918
+ "learning_rate": 6.694736842105264e-05,
919
+ "logps/chosen": -2150.89111328125,
920
+ "logps/rejected": NaN,
921
+ "loss": 0.5483,
922
+ "rewards/chosen": -190.25399780273438,
923
+ "rewards/margins": NaN,
924
+ "rewards/rejected": NaN,
925
+ "step": 1000
926
+ },
927
+ {
928
+ "epoch": 6.6,
929
+ "eval_kl": 0.0,
930
+ "eval_logps/chosen": -2251.134521484375,
931
+ "eval_logps/rejected": -1901.1729736328125,
932
+ "eval_loss": 0.5,
933
+ "eval_rewards/chosen": -197.16481018066406,
934
+ "eval_rewards/margins": -30.865028381347656,
935
+ "eval_rewards/rejected": -166.29977416992188,
936
+ "eval_runtime": 169.9607,
937
+ "eval_samples_per_second": 2.059,
938
+ "eval_steps_per_second": 0.518,
939
+ "step": 1000
940
+ },
941
+ {
942
+ "epoch": 6.73,
943
+ "grad_norm": 0.0,
944
+ "kl": 0.0,
945
+ "learning_rate": 6.414035087719299e-05,
946
+ "logps/chosen": NaN,
947
+ "logps/rejected": NaN,
948
+ "loss": 1.0998,
949
+ "rewards/chosen": NaN,
950
+ "rewards/margins": NaN,
951
+ "rewards/rejected": NaN,
952
+ "step": 1020
953
+ },
954
+ {
955
+ "epoch": 6.86,
956
+ "grad_norm": 0.0,
957
+ "kl": 0.0,
958
+ "learning_rate": 6.133333333333334e-05,
959
+ "logps/chosen": NaN,
960
+ "logps/rejected": NaN,
961
+ "loss": 0.7744,
962
+ "rewards/chosen": NaN,
963
+ "rewards/margins": NaN,
964
+ "rewards/rejected": NaN,
965
+ "step": 1040
966
+ },
967
+ {
968
+ "epoch": 6.93,
969
+ "eval_kl": 0.0,
970
+ "eval_logps/chosen": -2254.820068359375,
971
+ "eval_logps/rejected": -1904.1441650390625,
972
+ "eval_loss": 0.5,
973
+ "eval_rewards/chosen": -197.53334045410156,
974
+ "eval_rewards/margins": -30.936431884765625,
975
+ "eval_rewards/rejected": -166.59690856933594,
976
+ "eval_runtime": 169.9328,
977
+ "eval_samples_per_second": 2.06,
978
+ "eval_steps_per_second": 0.518,
979
+ "step": 1050
980
+ },
981
+ {
982
+ "epoch": 7.0,
983
+ "grad_norm": 0.0,
984
+ "kl": 0.0,
985
+ "learning_rate": 5.852631578947369e-05,
986
+ "logps/chosen": NaN,
987
+ "logps/rejected": NaN,
988
+ "loss": 0.7891,
989
+ "rewards/chosen": NaN,
990
+ "rewards/margins": NaN,
991
+ "rewards/rejected": NaN,
992
+ "step": 1060
993
+ },
994
+ {
995
+ "epoch": 7.13,
996
+ "grad_norm": 0.0,
997
+ "kl": 0.0,
998
+ "learning_rate": 5.571929824561404e-05,
999
+ "logps/chosen": NaN,
1000
+ "logps/rejected": NaN,
1001
+ "loss": 0.7203,
1002
+ "rewards/chosen": NaN,
1003
+ "rewards/margins": NaN,
1004
+ "rewards/rejected": NaN,
1005
+ "step": 1080
1006
+ },
1007
+ {
1008
+ "epoch": 7.26,
1009
+ "grad_norm": 0.0,
1010
+ "kl": 0.0,
1011
+ "learning_rate": 5.291228070175439e-05,
1012
+ "logps/chosen": NaN,
1013
+ "logps/rejected": NaN,
1014
+ "loss": 0.9077,
1015
+ "rewards/chosen": NaN,
1016
+ "rewards/margins": NaN,
1017
+ "rewards/rejected": NaN,
1018
+ "step": 1100
1019
+ },
1020
+ {
1021
+ "epoch": 7.26,
1022
+ "eval_kl": 0.0,
1023
+ "eval_logps/chosen": -2254.888427734375,
1024
+ "eval_logps/rejected": -1904.1827392578125,
1025
+ "eval_loss": 0.5,
1026
+ "eval_rewards/chosen": -197.54017639160156,
1027
+ "eval_rewards/margins": -30.939420700073242,
1028
+ "eval_rewards/rejected": -166.6007537841797,
1029
+ "eval_runtime": 169.9818,
1030
+ "eval_samples_per_second": 2.059,
1031
+ "eval_steps_per_second": 0.518,
1032
+ "step": 1100
1033
+ }
1034
+ ],
1035
+ "logging_steps": 20,
1036
+ "max_steps": 1470,
1037
+ "num_input_tokens_seen": 0,
1038
+ "num_train_epochs": 10,
1039
+ "save_steps": 100,
1040
+ "total_flos": 0.0,
1041
+ "train_batch_size": 4,
1042
+ "trial_name": null,
1043
+ "trial_params": null
1044
+ }
checkpoint-1100/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5309801a19049c58de3649400afbb558334e14e33dff69ca022789cf2400ea
3
+ size 5688
checkpoint-1200/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: HuggingFaceH4/zephyr-7b-beta
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
checkpoint-1200/adapter_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "HuggingFaceH4/zephyr-7b-beta",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "up_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "down_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "lm_head",
31
+ "embed_tokens"
32
+ ],
33
+ "task_type": "CAUSAL_LM",
34
+ "use_dora": false,
35
+ "use_rslora": false
36
+ }
checkpoint-1200/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a34892bba9479a33faeb36ceec4290aeba59d3a82abf645874a81f3fd670f9b
3
+ size 1134834064
checkpoint-1200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99903a5b8db99d3795bacf076d4b9cc93979ba0ebaf6f5895453ca3aa999dfec
3
+ size 172772766
checkpoint-1200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27a79239f98d586c6d293becfe4724cb48ad892f743d1e770886cde54b3333d6
3
+ size 14244
checkpoint-1200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3d496c0c5cba2ece3b6fbeaa4eb9daaa3402ba1e3d2b995451385e27b0416f5
3
+ size 1064
checkpoint-1200/special_tokens_map.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": "<unk>",
22
+ "unk_token": {
23
+ "content": "<unk>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false
28
+ }
29
+ }
checkpoint-1200/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1200/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
checkpoint-1200/tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "legacy": true,
40
+ "max_lenght": 8192,
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "<unk>",
43
+ "padding": true,
44
+ "sp_model_kwargs": {},
45
+ "spaces_between_special_tokens": false,
46
+ "tokenizer_class": "LlamaTokenizer",
47
+ "truncation_side": "left",
48
+ "unk_token": "<unk>",
49
+ "use_default_system_prompt": true
50
+ }
checkpoint-1200/trainer_state.json ADDED
@@ -0,0 +1,1137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5,
3
+ "best_model_checkpoint": "./zephyr/10-04-24-Weni-WeniGPT-Agents-Zephyr-1.0.25-KTO_Experiment with a new tokenizer configuration for chat template of zephyr-2_max_steps-1470_batch_16_2024-04-10_ppid_9/checkpoint-300",
4
+ "epoch": 7.920792079207921,
5
+ "eval_steps": 50,
6
+ "global_step": 1200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.13,
13
+ "grad_norm": 57.293792724609375,
14
+ "kl": 0.03853478282690048,
15
+ "learning_rate": 6.222222222222222e-05,
16
+ "logps/chosen": NaN,
17
+ "logps/rejected": NaN,
18
+ "loss": 0.7078,
19
+ "rewards/chosen": NaN,
20
+ "rewards/margins": NaN,
21
+ "rewards/rejected": NaN,
22
+ "step": 20
23
+ },
24
+ {
25
+ "epoch": 0.26,
26
+ "grad_norm": 112.50944519042969,
27
+ "kl": 3.2648494243621826,
28
+ "learning_rate": 0.00014666666666666666,
29
+ "logps/chosen": NaN,
30
+ "logps/rejected": NaN,
31
+ "loss": 0.6966,
32
+ "rewards/chosen": NaN,
33
+ "rewards/margins": NaN,
34
+ "rewards/rejected": NaN,
35
+ "step": 40
36
+ },
37
+ {
38
+ "epoch": 0.33,
39
+ "eval_kl": 0.0,
40
+ "eval_logps/chosen": -413.6161193847656,
41
+ "eval_logps/rejected": -362.2559509277344,
42
+ "eval_loss": 0.5063381791114807,
43
+ "eval_rewards/chosen": -13.412939071655273,
44
+ "eval_rewards/margins": -1.0048810243606567,
45
+ "eval_rewards/rejected": -12.408059120178223,
46
+ "eval_runtime": 170.1826,
47
+ "eval_samples_per_second": 2.057,
48
+ "eval_steps_per_second": 0.517,
49
+ "step": 50
50
+ },
51
+ {
52
+ "epoch": 0.4,
53
+ "grad_norm": 19.94582748413086,
54
+ "kl": 0.45922356843948364,
55
+ "learning_rate": 0.00019887719298245616,
56
+ "logps/chosen": NaN,
57
+ "logps/rejected": NaN,
58
+ "loss": 0.5743,
59
+ "rewards/chosen": NaN,
60
+ "rewards/margins": NaN,
61
+ "rewards/rejected": NaN,
62
+ "step": 60
63
+ },
64
+ {
65
+ "epoch": 0.53,
66
+ "grad_norm": 79.92957305908203,
67
+ "kl": 0.0,
68
+ "learning_rate": 0.0001960701754385965,
69
+ "logps/chosen": NaN,
70
+ "logps/rejected": NaN,
71
+ "loss": 0.6108,
72
+ "rewards/chosen": NaN,
73
+ "rewards/margins": NaN,
74
+ "rewards/rejected": NaN,
75
+ "step": 80
76
+ },
77
+ {
78
+ "epoch": 0.66,
79
+ "grad_norm": 0.06103940308094025,
80
+ "kl": 0.0,
81
+ "learning_rate": 0.00019326315789473686,
82
+ "logps/chosen": NaN,
83
+ "logps/rejected": NaN,
84
+ "loss": 0.754,
85
+ "rewards/chosen": NaN,
86
+ "rewards/margins": NaN,
87
+ "rewards/rejected": NaN,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.66,
92
+ "eval_kl": 0.0,
93
+ "eval_logps/chosen": -2027.0018310546875,
94
+ "eval_logps/rejected": -1697.82177734375,
95
+ "eval_loss": 0.5000000596046448,
96
+ "eval_rewards/chosen": -174.75149536132812,
97
+ "eval_rewards/margins": -28.786863327026367,
98
+ "eval_rewards/rejected": -145.96463012695312,
99
+ "eval_runtime": 170.0562,
100
+ "eval_samples_per_second": 2.058,
101
+ "eval_steps_per_second": 0.517,
102
+ "step": 100
103
+ },
104
+ {
105
+ "epoch": 0.79,
106
+ "grad_norm": 0.0,
107
+ "kl": 0.0,
108
+ "learning_rate": 0.0001904561403508772,
109
+ "logps/chosen": NaN,
110
+ "logps/rejected": NaN,
111
+ "loss": 0.95,
112
+ "rewards/chosen": NaN,
113
+ "rewards/margins": NaN,
114
+ "rewards/rejected": NaN,
115
+ "step": 120
116
+ },
117
+ {
118
+ "epoch": 0.92,
119
+ "grad_norm": 0.0,
120
+ "kl": 0.0,
121
+ "learning_rate": 0.00018764912280701756,
122
+ "logps/chosen": NaN,
123
+ "logps/rejected": NaN,
124
+ "loss": 0.6274,
125
+ "rewards/chosen": NaN,
126
+ "rewards/margins": NaN,
127
+ "rewards/rejected": NaN,
128
+ "step": 140
129
+ },
130
+ {
131
+ "epoch": 0.99,
132
+ "eval_kl": 0.0,
133
+ "eval_logps/chosen": -2237.81494140625,
134
+ "eval_logps/rejected": -1889.774169921875,
135
+ "eval_loss": 0.5,
136
+ "eval_rewards/chosen": -195.83285522460938,
137
+ "eval_rewards/margins": -30.672954559326172,
138
+ "eval_rewards/rejected": -165.15989685058594,
139
+ "eval_runtime": 169.8795,
140
+ "eval_samples_per_second": 2.06,
141
+ "eval_steps_per_second": 0.518,
142
+ "step": 150
143
+ },
144
+ {
145
+ "epoch": 1.06,
146
+ "grad_norm": 0.0,
147
+ "kl": 0.0,
148
+ "learning_rate": 0.0001848421052631579,
149
+ "logps/chosen": NaN,
150
+ "logps/rejected": NaN,
151
+ "loss": 0.6387,
152
+ "rewards/chosen": NaN,
153
+ "rewards/margins": NaN,
154
+ "rewards/rejected": NaN,
155
+ "step": 160
156
+ },
157
+ {
158
+ "epoch": 1.19,
159
+ "grad_norm": 0.0,
160
+ "kl": 0.0,
161
+ "learning_rate": 0.00018203508771929826,
162
+ "logps/chosen": NaN,
163
+ "logps/rejected": NaN,
164
+ "loss": 0.8327,
165
+ "rewards/chosen": NaN,
166
+ "rewards/margins": NaN,
167
+ "rewards/rejected": NaN,
168
+ "step": 180
169
+ },
170
+ {
171
+ "epoch": 1.32,
172
+ "grad_norm": 0.0,
173
+ "kl": 0.0,
174
+ "learning_rate": 0.00017922807017543862,
175
+ "logps/chosen": NaN,
176
+ "logps/rejected": NaN,
177
+ "loss": 0.642,
178
+ "rewards/chosen": NaN,
179
+ "rewards/margins": NaN,
180
+ "rewards/rejected": NaN,
181
+ "step": 200
182
+ },
183
+ {
184
+ "epoch": 1.32,
185
+ "eval_kl": 0.0,
186
+ "eval_logps/chosen": -2230.916259765625,
187
+ "eval_logps/rejected": -1884.9520263671875,
188
+ "eval_loss": 0.5000000596046448,
189
+ "eval_rewards/chosen": -195.14297485351562,
190
+ "eval_rewards/margins": -30.465293884277344,
191
+ "eval_rewards/rejected": -164.67767333984375,
192
+ "eval_runtime": 170.1489,
193
+ "eval_samples_per_second": 2.057,
194
+ "eval_steps_per_second": 0.517,
195
+ "step": 200
196
+ },
197
+ {
198
+ "epoch": 1.45,
199
+ "grad_norm": 0.0,
200
+ "kl": 0.0,
201
+ "learning_rate": 0.00017642105263157896,
202
+ "logps/chosen": NaN,
203
+ "logps/rejected": NaN,
204
+ "loss": 0.7493,
205
+ "rewards/chosen": NaN,
206
+ "rewards/margins": NaN,
207
+ "rewards/rejected": NaN,
208
+ "step": 220
209
+ },
210
+ {
211
+ "epoch": 1.58,
212
+ "grad_norm": 0.0,
213
+ "kl": 0.0,
214
+ "learning_rate": 0.0001736140350877193,
215
+ "logps/chosen": NaN,
216
+ "logps/rejected": NaN,
217
+ "loss": 0.6241,
218
+ "rewards/chosen": NaN,
219
+ "rewards/margins": NaN,
220
+ "rewards/rejected": NaN,
221
+ "step": 240
222
+ },
223
+ {
224
+ "epoch": 1.65,
225
+ "eval_kl": 0.0,
226
+ "eval_logps/chosen": -2230.957275390625,
227
+ "eval_logps/rejected": -1885.0225830078125,
228
+ "eval_loss": 0.5000000596046448,
229
+ "eval_rewards/chosen": -195.14706420898438,
230
+ "eval_rewards/margins": -30.462318420410156,
231
+ "eval_rewards/rejected": -164.68475341796875,
232
+ "eval_runtime": 170.1092,
233
+ "eval_samples_per_second": 2.058,
234
+ "eval_steps_per_second": 0.517,
235
+ "step": 250
236
+ },
237
+ {
238
+ "epoch": 1.72,
239
+ "grad_norm": 0.0,
240
+ "kl": 0.0,
241
+ "learning_rate": 0.00017080701754385965,
242
+ "logps/chosen": NaN,
243
+ "logps/rejected": NaN,
244
+ "loss": 0.9621,
245
+ "rewards/chosen": NaN,
246
+ "rewards/margins": NaN,
247
+ "rewards/rejected": NaN,
248
+ "step": 260
249
+ },
250
+ {
251
+ "epoch": 1.85,
252
+ "grad_norm": 0.0,
253
+ "kl": 0.0,
254
+ "learning_rate": 0.000168,
255
+ "logps/chosen": NaN,
256
+ "logps/rejected": NaN,
257
+ "loss": 0.7279,
258
+ "rewards/chosen": NaN,
259
+ "rewards/margins": NaN,
260
+ "rewards/rejected": NaN,
261
+ "step": 280
262
+ },
263
+ {
264
+ "epoch": 1.98,
265
+ "grad_norm": 0.0,
266
+ "kl": 0.0,
267
+ "learning_rate": 0.00016519298245614035,
268
+ "logps/chosen": NaN,
269
+ "logps/rejected": NaN,
270
+ "loss": 0.7477,
271
+ "rewards/chosen": NaN,
272
+ "rewards/margins": NaN,
273
+ "rewards/rejected": NaN,
274
+ "step": 300
275
+ },
276
+ {
277
+ "epoch": 1.98,
278
+ "eval_kl": 0.0,
279
+ "eval_logps/chosen": -2238.164306640625,
280
+ "eval_logps/rejected": -1890.7996826171875,
281
+ "eval_loss": 0.5,
282
+ "eval_rewards/chosen": -195.86773681640625,
283
+ "eval_rewards/margins": -30.605329513549805,
284
+ "eval_rewards/rejected": -165.26242065429688,
285
+ "eval_runtime": 170.0647,
286
+ "eval_samples_per_second": 2.058,
287
+ "eval_steps_per_second": 0.517,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 2.11,
292
+ "grad_norm": 0.0,
293
+ "kl": 0.0,
294
+ "learning_rate": 0.00016238596491228072,
295
+ "logps/chosen": NaN,
296
+ "logps/rejected": NaN,
297
+ "loss": 0.7111,
298
+ "rewards/chosen": NaN,
299
+ "rewards/margins": NaN,
300
+ "rewards/rejected": NaN,
301
+ "step": 320
302
+ },
303
+ {
304
+ "epoch": 2.24,
305
+ "grad_norm": 0.0,
306
+ "kl": 0.0,
307
+ "learning_rate": 0.00015957894736842105,
308
+ "logps/chosen": NaN,
309
+ "logps/rejected": NaN,
310
+ "loss": 0.8685,
311
+ "rewards/chosen": NaN,
312
+ "rewards/margins": NaN,
313
+ "rewards/rejected": NaN,
314
+ "step": 340
315
+ },
316
+ {
317
+ "epoch": 2.31,
318
+ "eval_kl": 0.0,
319
+ "eval_logps/chosen": -2238.054931640625,
320
+ "eval_logps/rejected": -1890.694580078125,
321
+ "eval_loss": 0.5,
322
+ "eval_rewards/chosen": -195.85682678222656,
323
+ "eval_rewards/margins": -30.604921340942383,
324
+ "eval_rewards/rejected": -165.2519073486328,
325
+ "eval_runtime": 170.0528,
326
+ "eval_samples_per_second": 2.058,
327
+ "eval_steps_per_second": 0.517,
328
+ "step": 350
329
+ },
330
+ {
331
+ "epoch": 2.38,
332
+ "grad_norm": 0.0,
333
+ "kl": 0.0,
334
+ "learning_rate": 0.00015677192982456142,
335
+ "logps/chosen": NaN,
336
+ "logps/rejected": NaN,
337
+ "loss": 0.6905,
338
+ "rewards/chosen": NaN,
339
+ "rewards/margins": NaN,
340
+ "rewards/rejected": NaN,
341
+ "step": 360
342
+ },
343
+ {
344
+ "epoch": 2.51,
345
+ "grad_norm": 0.0,
346
+ "kl": 0.0,
347
+ "learning_rate": 0.00015396491228070175,
348
+ "logps/chosen": NaN,
349
+ "logps/rejected": NaN,
350
+ "loss": 0.736,
351
+ "rewards/chosen": NaN,
352
+ "rewards/margins": NaN,
353
+ "rewards/rejected": NaN,
354
+ "step": 380
355
+ },
356
+ {
357
+ "epoch": 2.64,
358
+ "grad_norm": 0.0,
359
+ "kl": 0.0,
360
+ "learning_rate": 0.00015115789473684211,
361
+ "logps/chosen": NaN,
362
+ "logps/rejected": NaN,
363
+ "loss": 0.693,
364
+ "rewards/chosen": NaN,
365
+ "rewards/margins": NaN,
366
+ "rewards/rejected": NaN,
367
+ "step": 400
368
+ },
369
+ {
370
+ "epoch": 2.64,
371
+ "eval_kl": 0.0,
372
+ "eval_logps/chosen": -2237.827392578125,
373
+ "eval_logps/rejected": -1890.5028076171875,
374
+ "eval_loss": 0.5,
375
+ "eval_rewards/chosen": -195.83407592773438,
376
+ "eval_rewards/margins": -30.601318359375,
377
+ "eval_rewards/rejected": -165.23275756835938,
378
+ "eval_runtime": 170.2445,
379
+ "eval_samples_per_second": 2.056,
380
+ "eval_steps_per_second": 0.517,
381
+ "step": 400
382
+ },
383
+ {
384
+ "epoch": 2.77,
385
+ "grad_norm": 8.788210266175156e-07,
386
+ "kl": 0.0,
387
+ "learning_rate": 0.00014835087719298245,
388
+ "logps/chosen": NaN,
389
+ "logps/rejected": NaN,
390
+ "loss": 0.8652,
391
+ "rewards/chosen": NaN,
392
+ "rewards/margins": NaN,
393
+ "rewards/rejected": NaN,
394
+ "step": 420
395
+ },
396
+ {
397
+ "epoch": 2.9,
398
+ "grad_norm": 0.0,
399
+ "kl": 0.0,
400
+ "learning_rate": 0.0001455438596491228,
401
+ "logps/chosen": NaN,
402
+ "logps/rejected": NaN,
403
+ "loss": 0.686,
404
+ "rewards/chosen": NaN,
405
+ "rewards/margins": NaN,
406
+ "rewards/rejected": NaN,
407
+ "step": 440
408
+ },
409
+ {
410
+ "epoch": 2.97,
411
+ "eval_kl": 0.0,
412
+ "eval_logps/chosen": -2237.722412109375,
413
+ "eval_logps/rejected": -1890.4027099609375,
414
+ "eval_loss": 0.5,
415
+ "eval_rewards/chosen": -195.82354736328125,
416
+ "eval_rewards/margins": -30.600812911987305,
417
+ "eval_rewards/rejected": -165.22274780273438,
418
+ "eval_runtime": 170.3429,
419
+ "eval_samples_per_second": 2.055,
420
+ "eval_steps_per_second": 0.517,
421
+ "step": 450
422
+ },
423
+ {
424
+ "epoch": 3.04,
425
+ "grad_norm": 0.0,
426
+ "kl": 0.0,
427
+ "learning_rate": 0.00014273684210526318,
428
+ "logps/chosen": NaN,
429
+ "logps/rejected": NaN,
430
+ "loss": 0.6858,
431
+ "rewards/chosen": NaN,
432
+ "rewards/margins": NaN,
433
+ "rewards/rejected": NaN,
434
+ "step": 460
435
+ },
436
+ {
437
+ "epoch": 3.17,
438
+ "grad_norm": 0.0,
439
+ "kl": 0.0,
440
+ "learning_rate": 0.0001399298245614035,
441
+ "logps/chosen": NaN,
442
+ "logps/rejected": NaN,
443
+ "loss": 0.8479,
444
+ "rewards/chosen": NaN,
445
+ "rewards/margins": NaN,
446
+ "rewards/rejected": NaN,
447
+ "step": 480
448
+ },
449
+ {
450
+ "epoch": 3.3,
451
+ "grad_norm": 0.0,
452
+ "kl": 0.0,
453
+ "learning_rate": 0.00013712280701754388,
454
+ "logps/chosen": NaN,
455
+ "logps/rejected": NaN,
456
+ "loss": 0.6119,
457
+ "rewards/chosen": NaN,
458
+ "rewards/margins": NaN,
459
+ "rewards/rejected": NaN,
460
+ "step": 500
461
+ },
462
+ {
463
+ "epoch": 3.3,
464
+ "eval_kl": 0.0,
465
+ "eval_logps/chosen": -2237.6083984375,
466
+ "eval_logps/rejected": -1890.3140869140625,
467
+ "eval_loss": 0.5,
468
+ "eval_rewards/chosen": -195.81216430664062,
469
+ "eval_rewards/margins": -30.598268508911133,
470
+ "eval_rewards/rejected": -165.21388244628906,
471
+ "eval_runtime": 169.9488,
472
+ "eval_samples_per_second": 2.059,
473
+ "eval_steps_per_second": 0.518,
474
+ "step": 500
475
+ },
476
+ {
477
+ "epoch": 3.43,
478
+ "grad_norm": 0.0,
479
+ "kl": 0.0,
480
+ "learning_rate": 0.0001343157894736842,
481
+ "logps/chosen": NaN,
482
+ "logps/rejected": NaN,
483
+ "loss": 0.7107,
484
+ "rewards/chosen": NaN,
485
+ "rewards/margins": NaN,
486
+ "rewards/rejected": NaN,
487
+ "step": 520
488
+ },
489
+ {
490
+ "epoch": 3.56,
491
+ "grad_norm": 0.0,
492
+ "kl": 0.0,
493
+ "learning_rate": 0.00013150877192982455,
494
+ "logps/chosen": NaN,
495
+ "logps/rejected": NaN,
496
+ "loss": 0.5902,
497
+ "rewards/chosen": NaN,
498
+ "rewards/margins": NaN,
499
+ "rewards/rejected": NaN,
500
+ "step": 540
501
+ },
502
+ {
503
+ "epoch": 3.63,
504
+ "eval_kl": 0.0,
505
+ "eval_logps/chosen": -2237.56494140625,
506
+ "eval_logps/rejected": -1890.3043212890625,
507
+ "eval_loss": 0.5,
508
+ "eval_rewards/chosen": -195.80784606933594,
509
+ "eval_rewards/margins": -30.59491539001465,
510
+ "eval_rewards/rejected": -165.21290588378906,
511
+ "eval_runtime": 169.9756,
512
+ "eval_samples_per_second": 2.059,
513
+ "eval_steps_per_second": 0.518,
514
+ "step": 550
515
+ },
516
+ {
517
+ "epoch": 3.7,
518
+ "grad_norm": 0.0,
519
+ "kl": 0.0,
520
+ "learning_rate": 0.0001287017543859649,
521
+ "logps/chosen": NaN,
522
+ "logps/rejected": NaN,
523
+ "loss": 0.9042,
524
+ "rewards/chosen": NaN,
525
+ "rewards/margins": NaN,
526
+ "rewards/rejected": NaN,
527
+ "step": 560
528
+ },
529
+ {
530
+ "epoch": 3.83,
531
+ "grad_norm": 0.0,
532
+ "kl": 0.0,
533
+ "learning_rate": 0.00012589473684210527,
534
+ "logps/chosen": NaN,
535
+ "logps/rejected": NaN,
536
+ "loss": 0.7268,
537
+ "rewards/chosen": NaN,
538
+ "rewards/margins": NaN,
539
+ "rewards/rejected": NaN,
540
+ "step": 580
541
+ },
542
+ {
543
+ "epoch": 3.96,
544
+ "grad_norm": 0.0,
545
+ "kl": 0.0,
546
+ "learning_rate": 0.00012308771929824564,
547
+ "logps/chosen": NaN,
548
+ "logps/rejected": NaN,
549
+ "loss": 0.7106,
550
+ "rewards/chosen": NaN,
551
+ "rewards/margins": NaN,
552
+ "rewards/rejected": NaN,
553
+ "step": 600
554
+ },
555
+ {
556
+ "epoch": 3.96,
557
+ "eval_kl": 0.0,
558
+ "eval_logps/chosen": -2241.97509765625,
559
+ "eval_logps/rejected": -1893.87646484375,
560
+ "eval_loss": 0.5,
561
+ "eval_rewards/chosen": -196.24884033203125,
562
+ "eval_rewards/margins": -30.67871856689453,
563
+ "eval_rewards/rejected": -165.57012939453125,
564
+ "eval_runtime": 169.9427,
565
+ "eval_samples_per_second": 2.06,
566
+ "eval_steps_per_second": 0.518,
567
+ "step": 600
568
+ },
569
+ {
570
+ "epoch": 4.09,
571
+ "grad_norm": 0.0,
572
+ "kl": 0.0,
573
+ "learning_rate": 0.00012028070175438597,
574
+ "logps/chosen": NaN,
575
+ "logps/rejected": NaN,
576
+ "loss": 0.6829,
577
+ "rewards/chosen": NaN,
578
+ "rewards/margins": NaN,
579
+ "rewards/rejected": NaN,
580
+ "step": 620
581
+ },
582
+ {
583
+ "epoch": 4.22,
584
+ "grad_norm": 0.0,
585
+ "kl": 0.0,
586
+ "learning_rate": 0.00011747368421052631,
587
+ "logps/chosen": NaN,
588
+ "logps/rejected": NaN,
589
+ "loss": 0.8232,
590
+ "rewards/chosen": NaN,
591
+ "rewards/margins": NaN,
592
+ "rewards/rejected": NaN,
593
+ "step": 640
594
+ },
595
+ {
596
+ "epoch": 4.29,
597
+ "eval_kl": 0.0,
598
+ "eval_logps/chosen": -2241.91552734375,
599
+ "eval_logps/rejected": -1893.757080078125,
600
+ "eval_loss": 0.5,
601
+ "eval_rewards/chosen": -196.24290466308594,
602
+ "eval_rewards/margins": -30.684709548950195,
603
+ "eval_rewards/rejected": -165.55816650390625,
604
+ "eval_runtime": 169.9605,
605
+ "eval_samples_per_second": 2.059,
606
+ "eval_steps_per_second": 0.518,
607
+ "step": 650
608
+ },
609
+ {
610
+ "epoch": 4.36,
611
+ "grad_norm": 0.0,
612
+ "kl": 0.0,
613
+ "learning_rate": 0.00011466666666666667,
614
+ "logps/chosen": -2123.240234375,
615
+ "logps/rejected": NaN,
616
+ "loss": 0.6315,
617
+ "rewards/chosen": -188.09486389160156,
618
+ "rewards/margins": NaN,
619
+ "rewards/rejected": NaN,
620
+ "step": 660
621
+ },
622
+ {
623
+ "epoch": 4.49,
624
+ "grad_norm": 0.0,
625
+ "kl": 0.0,
626
+ "learning_rate": 0.00011185964912280702,
627
+ "logps/chosen": NaN,
628
+ "logps/rejected": NaN,
629
+ "loss": 0.7998,
630
+ "rewards/chosen": NaN,
631
+ "rewards/margins": NaN,
632
+ "rewards/rejected": NaN,
633
+ "step": 680
634
+ },
635
+ {
636
+ "epoch": 4.62,
637
+ "grad_norm": 0.0,
638
+ "kl": 0.0,
639
+ "learning_rate": 0.00010905263157894738,
640
+ "logps/chosen": NaN,
641
+ "logps/rejected": NaN,
642
+ "loss": 0.5881,
643
+ "rewards/chosen": NaN,
644
+ "rewards/margins": NaN,
645
+ "rewards/rejected": NaN,
646
+ "step": 700
647
+ },
648
+ {
649
+ "epoch": 4.62,
650
+ "eval_kl": 0.0,
651
+ "eval_logps/chosen": -2251.134033203125,
652
+ "eval_logps/rejected": -1901.2047119140625,
653
+ "eval_loss": 0.5,
654
+ "eval_rewards/chosen": -197.1647491455078,
655
+ "eval_rewards/margins": -30.8618221282959,
656
+ "eval_rewards/rejected": -166.30291748046875,
657
+ "eval_runtime": 169.9704,
658
+ "eval_samples_per_second": 2.059,
659
+ "eval_steps_per_second": 0.518,
660
+ "step": 700
661
+ },
662
+ {
663
+ "epoch": 4.75,
664
+ "grad_norm": 0.0,
665
+ "kl": 0.0,
666
+ "learning_rate": 0.00010624561403508772,
667
+ "logps/chosen": NaN,
668
+ "logps/rejected": NaN,
669
+ "loss": 0.8756,
670
+ "rewards/chosen": NaN,
671
+ "rewards/margins": NaN,
672
+ "rewards/rejected": NaN,
673
+ "step": 720
674
+ },
675
+ {
676
+ "epoch": 4.88,
677
+ "grad_norm": 0.0,
678
+ "kl": 0.0,
679
+ "learning_rate": 0.00010343859649122807,
680
+ "logps/chosen": NaN,
681
+ "logps/rejected": NaN,
682
+ "loss": 0.6156,
683
+ "rewards/chosen": NaN,
684
+ "rewards/margins": NaN,
685
+ "rewards/rejected": NaN,
686
+ "step": 740
687
+ },
688
+ {
689
+ "epoch": 4.95,
690
+ "eval_kl": 0.0,
691
+ "eval_logps/chosen": -2250.90234375,
692
+ "eval_logps/rejected": -1901.0179443359375,
693
+ "eval_loss": 0.5,
694
+ "eval_rewards/chosen": -197.1415557861328,
695
+ "eval_rewards/margins": -30.857322692871094,
696
+ "eval_rewards/rejected": -166.28424072265625,
697
+ "eval_runtime": 169.9616,
698
+ "eval_samples_per_second": 2.059,
699
+ "eval_steps_per_second": 0.518,
700
+ "step": 750
701
+ },
702
+ {
703
+ "epoch": 5.02,
704
+ "grad_norm": 0.0,
705
+ "kl": 0.0,
706
+ "learning_rate": 0.00010063157894736843,
707
+ "logps/chosen": NaN,
708
+ "logps/rejected": NaN,
709
+ "loss": 0.7376,
710
+ "rewards/chosen": NaN,
711
+ "rewards/margins": NaN,
712
+ "rewards/rejected": NaN,
713
+ "step": 760
714
+ },
715
+ {
716
+ "epoch": 5.15,
717
+ "grad_norm": 0.0,
718
+ "kl": 0.0,
719
+ "learning_rate": 9.782456140350877e-05,
720
+ "logps/chosen": NaN,
721
+ "logps/rejected": NaN,
722
+ "loss": 0.7998,
723
+ "rewards/chosen": NaN,
724
+ "rewards/margins": NaN,
725
+ "rewards/rejected": NaN,
726
+ "step": 780
727
+ },
728
+ {
729
+ "epoch": 5.28,
730
+ "grad_norm": 0.0,
731
+ "kl": 0.0,
732
+ "learning_rate": 9.501754385964913e-05,
733
+ "logps/chosen": NaN,
734
+ "logps/rejected": NaN,
735
+ "loss": 0.6291,
736
+ "rewards/chosen": NaN,
737
+ "rewards/margins": NaN,
738
+ "rewards/rejected": NaN,
739
+ "step": 800
740
+ },
741
+ {
742
+ "epoch": 5.28,
743
+ "eval_kl": 0.0,
744
+ "eval_logps/chosen": -2250.995849609375,
745
+ "eval_logps/rejected": -1901.1036376953125,
746
+ "eval_loss": 0.5,
747
+ "eval_rewards/chosen": -197.15087890625,
748
+ "eval_rewards/margins": -30.858049392700195,
749
+ "eval_rewards/rejected": -166.29283142089844,
750
+ "eval_runtime": 169.941,
751
+ "eval_samples_per_second": 2.06,
752
+ "eval_steps_per_second": 0.518,
753
+ "step": 800
754
+ },
755
+ {
756
+ "epoch": 5.41,
757
+ "grad_norm": 0.0,
758
+ "kl": 0.0,
759
+ "learning_rate": 9.221052631578948e-05,
760
+ "logps/chosen": NaN,
761
+ "logps/rejected": NaN,
762
+ "loss": 0.7167,
763
+ "rewards/chosen": NaN,
764
+ "rewards/margins": NaN,
765
+ "rewards/rejected": NaN,
766
+ "step": 820
767
+ },
768
+ {
769
+ "epoch": 5.54,
770
+ "grad_norm": 0.0,
771
+ "kl": 0.0,
772
+ "learning_rate": 8.940350877192983e-05,
773
+ "logps/chosen": NaN,
774
+ "logps/rejected": NaN,
775
+ "loss": 0.6285,
776
+ "rewards/chosen": NaN,
777
+ "rewards/margins": NaN,
778
+ "rewards/rejected": NaN,
779
+ "step": 840
780
+ },
781
+ {
782
+ "epoch": 5.61,
783
+ "eval_kl": 0.0,
784
+ "eval_logps/chosen": -2251.08837890625,
785
+ "eval_logps/rejected": -1901.1571044921875,
786
+ "eval_loss": 0.5,
787
+ "eval_rewards/chosen": -197.16017150878906,
788
+ "eval_rewards/margins": -30.86201286315918,
789
+ "eval_rewards/rejected": -166.2981719970703,
790
+ "eval_runtime": 169.9583,
791
+ "eval_samples_per_second": 2.059,
792
+ "eval_steps_per_second": 0.518,
793
+ "step": 850
794
+ },
795
+ {
796
+ "epoch": 5.68,
797
+ "grad_norm": 0.0,
798
+ "kl": 0.0,
799
+ "learning_rate": 8.659649122807018e-05,
800
+ "logps/chosen": NaN,
801
+ "logps/rejected": NaN,
802
+ "loss": 0.7898,
803
+ "rewards/chosen": NaN,
804
+ "rewards/margins": NaN,
805
+ "rewards/rejected": NaN,
806
+ "step": 860
807
+ },
808
+ {
809
+ "epoch": 5.81,
810
+ "grad_norm": 0.0,
811
+ "kl": 0.0,
812
+ "learning_rate": 8.378947368421053e-05,
813
+ "logps/chosen": NaN,
814
+ "logps/rejected": NaN,
815
+ "loss": 0.8174,
816
+ "rewards/chosen": NaN,
817
+ "rewards/margins": NaN,
818
+ "rewards/rejected": NaN,
819
+ "step": 880
820
+ },
821
+ {
822
+ "epoch": 5.94,
823
+ "grad_norm": 0.0,
824
+ "kl": 0.0,
825
+ "learning_rate": 8.098245614035088e-05,
826
+ "logps/chosen": NaN,
827
+ "logps/rejected": NaN,
828
+ "loss": 0.6918,
829
+ "rewards/chosen": NaN,
830
+ "rewards/margins": NaN,
831
+ "rewards/rejected": NaN,
832
+ "step": 900
833
+ },
834
+ {
835
+ "epoch": 5.94,
836
+ "eval_kl": 0.0,
837
+ "eval_logps/chosen": -2251.1103515625,
838
+ "eval_logps/rejected": -1901.1773681640625,
839
+ "eval_loss": 0.5,
840
+ "eval_rewards/chosen": -197.16233825683594,
841
+ "eval_rewards/margins": -30.86213493347168,
842
+ "eval_rewards/rejected": -166.30018615722656,
843
+ "eval_runtime": 170.0642,
844
+ "eval_samples_per_second": 2.058,
845
+ "eval_steps_per_second": 0.517,
846
+ "step": 900
847
+ },
848
+ {
849
+ "epoch": 6.07,
850
+ "grad_norm": 0.0,
851
+ "kl": 0.0,
852
+ "learning_rate": 7.817543859649124e-05,
853
+ "logps/chosen": NaN,
854
+ "logps/rejected": NaN,
855
+ "loss": 0.6965,
856
+ "rewards/chosen": NaN,
857
+ "rewards/margins": NaN,
858
+ "rewards/rejected": NaN,
859
+ "step": 920
860
+ },
861
+ {
862
+ "epoch": 6.2,
863
+ "grad_norm": 0.0,
864
+ "kl": 0.0,
865
+ "learning_rate": 7.536842105263158e-05,
866
+ "logps/chosen": NaN,
867
+ "logps/rejected": NaN,
868
+ "loss": 0.7869,
869
+ "rewards/chosen": NaN,
870
+ "rewards/margins": NaN,
871
+ "rewards/rejected": NaN,
872
+ "step": 940
873
+ },
874
+ {
875
+ "epoch": 6.27,
876
+ "eval_kl": 0.0,
877
+ "eval_logps/chosen": -2251.116943359375,
878
+ "eval_logps/rejected": -1901.21484375,
879
+ "eval_loss": 0.5,
880
+ "eval_rewards/chosen": -197.16302490234375,
881
+ "eval_rewards/margins": -30.85906982421875,
882
+ "eval_rewards/rejected": -166.303955078125,
883
+ "eval_runtime": 169.9373,
884
+ "eval_samples_per_second": 2.06,
885
+ "eval_steps_per_second": 0.518,
886
+ "step": 950
887
+ },
888
+ {
889
+ "epoch": 6.34,
890
+ "grad_norm": 0.0,
891
+ "kl": 0.0,
892
+ "learning_rate": 7.256140350877193e-05,
893
+ "logps/chosen": NaN,
894
+ "logps/rejected": NaN,
895
+ "loss": 0.6402,
896
+ "rewards/chosen": NaN,
897
+ "rewards/margins": NaN,
898
+ "rewards/rejected": NaN,
899
+ "step": 960
900
+ },
901
+ {
902
+ "epoch": 6.47,
903
+ "grad_norm": 0.0,
904
+ "kl": 0.0,
905
+ "learning_rate": 6.975438596491229e-05,
906
+ "logps/chosen": NaN,
907
+ "logps/rejected": NaN,
908
+ "loss": 0.8122,
909
+ "rewards/chosen": NaN,
910
+ "rewards/margins": NaN,
911
+ "rewards/rejected": NaN,
912
+ "step": 980
913
+ },
914
+ {
915
+ "epoch": 6.6,
916
+ "grad_norm": 0.0,
917
+ "kl": 0.0,
918
+ "learning_rate": 6.694736842105264e-05,
919
+ "logps/chosen": -2150.89111328125,
920
+ "logps/rejected": NaN,
921
+ "loss": 0.5483,
922
+ "rewards/chosen": -190.25399780273438,
923
+ "rewards/margins": NaN,
924
+ "rewards/rejected": NaN,
925
+ "step": 1000
926
+ },
927
+ {
928
+ "epoch": 6.6,
929
+ "eval_kl": 0.0,
930
+ "eval_logps/chosen": -2251.134521484375,
931
+ "eval_logps/rejected": -1901.1729736328125,
932
+ "eval_loss": 0.5,
933
+ "eval_rewards/chosen": -197.16481018066406,
934
+ "eval_rewards/margins": -30.865028381347656,
935
+ "eval_rewards/rejected": -166.29977416992188,
936
+ "eval_runtime": 169.9607,
937
+ "eval_samples_per_second": 2.059,
938
+ "eval_steps_per_second": 0.518,
939
+ "step": 1000
940
+ },
941
+ {
942
+ "epoch": 6.73,
943
+ "grad_norm": 0.0,
944
+ "kl": 0.0,
945
+ "learning_rate": 6.414035087719299e-05,
946
+ "logps/chosen": NaN,
947
+ "logps/rejected": NaN,
948
+ "loss": 1.0998,
949
+ "rewards/chosen": NaN,
950
+ "rewards/margins": NaN,
951
+ "rewards/rejected": NaN,
952
+ "step": 1020
953
+ },
954
+ {
955
+ "epoch": 6.86,
956
+ "grad_norm": 0.0,
957
+ "kl": 0.0,
958
+ "learning_rate": 6.133333333333334e-05,
959
+ "logps/chosen": NaN,
960
+ "logps/rejected": NaN,
961
+ "loss": 0.7744,
962
+ "rewards/chosen": NaN,
963
+ "rewards/margins": NaN,
964
+ "rewards/rejected": NaN,
965
+ "step": 1040
966
+ },
967
+ {
968
+ "epoch": 6.93,
969
+ "eval_kl": 0.0,
970
+ "eval_logps/chosen": -2254.820068359375,
971
+ "eval_logps/rejected": -1904.1441650390625,
972
+ "eval_loss": 0.5,
973
+ "eval_rewards/chosen": -197.53334045410156,
974
+ "eval_rewards/margins": -30.936431884765625,
975
+ "eval_rewards/rejected": -166.59690856933594,
976
+ "eval_runtime": 169.9328,
977
+ "eval_samples_per_second": 2.06,
978
+ "eval_steps_per_second": 0.518,
979
+ "step": 1050
980
+ },
981
+ {
982
+ "epoch": 7.0,
983
+ "grad_norm": 0.0,
984
+ "kl": 0.0,
985
+ "learning_rate": 5.852631578947369e-05,
986
+ "logps/chosen": NaN,
987
+ "logps/rejected": NaN,
988
+ "loss": 0.7891,
989
+ "rewards/chosen": NaN,
990
+ "rewards/margins": NaN,
991
+ "rewards/rejected": NaN,
992
+ "step": 1060
993
+ },
994
+ {
995
+ "epoch": 7.13,
996
+ "grad_norm": 0.0,
997
+ "kl": 0.0,
998
+ "learning_rate": 5.571929824561404e-05,
999
+ "logps/chosen": NaN,
1000
+ "logps/rejected": NaN,
1001
+ "loss": 0.7203,
1002
+ "rewards/chosen": NaN,
1003
+ "rewards/margins": NaN,
1004
+ "rewards/rejected": NaN,
1005
+ "step": 1080
1006
+ },
1007
+ {
1008
+ "epoch": 7.26,
1009
+ "grad_norm": 0.0,
1010
+ "kl": 0.0,
1011
+ "learning_rate": 5.291228070175439e-05,
1012
+ "logps/chosen": NaN,
1013
+ "logps/rejected": NaN,
1014
+ "loss": 0.9077,
1015
+ "rewards/chosen": NaN,
1016
+ "rewards/margins": NaN,
1017
+ "rewards/rejected": NaN,
1018
+ "step": 1100
1019
+ },
1020
+ {
1021
+ "epoch": 7.26,
1022
+ "eval_kl": 0.0,
1023
+ "eval_logps/chosen": -2254.888427734375,
1024
+ "eval_logps/rejected": -1904.1827392578125,
1025
+ "eval_loss": 0.5,
1026
+ "eval_rewards/chosen": -197.54017639160156,
1027
+ "eval_rewards/margins": -30.939420700073242,
1028
+ "eval_rewards/rejected": -166.6007537841797,
1029
+ "eval_runtime": 169.9818,
1030
+ "eval_samples_per_second": 2.059,
1031
+ "eval_steps_per_second": 0.518,
1032
+ "step": 1100
1033
+ },
1034
+ {
1035
+ "epoch": 7.39,
1036
+ "grad_norm": 0.0,
1037
+ "kl": 0.0,
1038
+ "learning_rate": 5.010526315789474e-05,
1039
+ "logps/chosen": NaN,
1040
+ "logps/rejected": NaN,
1041
+ "loss": 0.6196,
1042
+ "rewards/chosen": NaN,
1043
+ "rewards/margins": NaN,
1044
+ "rewards/rejected": NaN,
1045
+ "step": 1120
1046
+ },
1047
+ {
1048
+ "epoch": 7.52,
1049
+ "grad_norm": 0.0,
1050
+ "kl": 0.0,
1051
+ "learning_rate": 4.729824561403509e-05,
1052
+ "logps/chosen": NaN,
1053
+ "logps/rejected": NaN,
1054
+ "loss": 0.664,
1055
+ "rewards/chosen": NaN,
1056
+ "rewards/margins": NaN,
1057
+ "rewards/rejected": NaN,
1058
+ "step": 1140
1059
+ },
1060
+ {
1061
+ "epoch": 7.59,
1062
+ "eval_kl": 0.0,
1063
+ "eval_logps/chosen": -2252.107421875,
1064
+ "eval_logps/rejected": -1901.9637451171875,
1065
+ "eval_loss": 0.5,
1066
+ "eval_rewards/chosen": -197.26206970214844,
1067
+ "eval_rewards/margins": -30.88323974609375,
1068
+ "eval_rewards/rejected": -166.37884521484375,
1069
+ "eval_runtime": 169.9835,
1070
+ "eval_samples_per_second": 2.059,
1071
+ "eval_steps_per_second": 0.518,
1072
+ "step": 1150
1073
+ },
1074
+ {
1075
+ "epoch": 7.66,
1076
+ "grad_norm": 0.0,
1077
+ "kl": 0.0,
1078
+ "learning_rate": 4.449122807017544e-05,
1079
+ "logps/chosen": NaN,
1080
+ "logps/rejected": NaN,
1081
+ "loss": 0.7551,
1082
+ "rewards/chosen": NaN,
1083
+ "rewards/margins": NaN,
1084
+ "rewards/rejected": NaN,
1085
+ "step": 1160
1086
+ },
1087
+ {
1088
+ "epoch": 7.79,
1089
+ "grad_norm": 0.0,
1090
+ "kl": 0.0,
1091
+ "learning_rate": 4.168421052631579e-05,
1092
+ "logps/chosen": NaN,
1093
+ "logps/rejected": NaN,
1094
+ "loss": 0.9193,
1095
+ "rewards/chosen": NaN,
1096
+ "rewards/margins": NaN,
1097
+ "rewards/rejected": NaN,
1098
+ "step": 1180
1099
+ },
1100
+ {
1101
+ "epoch": 7.92,
1102
+ "grad_norm": 0.0,
1103
+ "kl": 0.0,
1104
+ "learning_rate": 3.887719298245614e-05,
1105
+ "logps/chosen": NaN,
1106
+ "logps/rejected": NaN,
1107
+ "loss": 0.6126,
1108
+ "rewards/chosen": NaN,
1109
+ "rewards/margins": NaN,
1110
+ "rewards/rejected": NaN,
1111
+ "step": 1200
1112
+ },
1113
+ {
1114
+ "epoch": 7.92,
1115
+ "eval_kl": 0.0,
1116
+ "eval_logps/chosen": -2251.9697265625,
1117
+ "eval_logps/rejected": -1901.8804931640625,
1118
+ "eval_loss": 0.5,
1119
+ "eval_rewards/chosen": -197.24830627441406,
1120
+ "eval_rewards/margins": -30.877805709838867,
1121
+ "eval_rewards/rejected": -166.37051391601562,
1122
+ "eval_runtime": 169.9515,
1123
+ "eval_samples_per_second": 2.059,
1124
+ "eval_steps_per_second": 0.518,
1125
+ "step": 1200
1126
+ }
1127
+ ],
1128
+ "logging_steps": 20,
1129
+ "max_steps": 1470,
1130
+ "num_input_tokens_seen": 0,
1131
+ "num_train_epochs": 10,
1132
+ "save_steps": 100,
1133
+ "total_flos": 0.0,
1134
+ "train_batch_size": 4,
1135
+ "trial_name": null,
1136
+ "trial_params": null
1137
+ }
checkpoint-1200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5309801a19049c58de3649400afbb558334e14e33dff69ca022789cf2400ea
3
+ size 5688
checkpoint-1300/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: HuggingFaceH4/zephyr-7b-beta
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0