emonnsl commited on
Commit
ae0aef2
1 Parent(s): aba4cd7

Upload 7 files

Browse files
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: hishab/titulm-1b-bn-v1
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.1
adapter_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "hishab/titulm-1b-bn-v1",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 32,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 16,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "down_proj",
24
+ "out_proj",
25
+ "up_proj"
26
+ ],
27
+ "task_type": "CAUSAL_LM",
28
+ "use_dora": false,
29
+ "use_rslora": false
30
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a3b4361eea93a0bda321ad452248bbccc466147cd1696128e8396206dc26b99
3
+ size 37768408
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff09c451a10d4088bb54a29920ab4c7f6bfe1e15f4fa7f70303d92cee153f304
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1da0efa6bd078f65dc649d4e90d30aba6d462a8362459d030146b3c40a5e6c58
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,2253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.9664931297302246,
3
+ "best_model_checkpoint": "./lora_bn_resume/checkpoint-3000",
4
+ "epoch": 1.9292604501607717,
5
+ "eval_steps": 200,
6
+ "global_step": 3000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.006430868167202572,
13
+ "grad_norm": 0.7529953718185425,
14
+ "learning_rate": 2.9999999999999997e-05,
15
+ "loss": 2.01,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.012861736334405145,
20
+ "grad_norm": 0.8143910765647888,
21
+ "learning_rate": 5.9999999999999995e-05,
22
+ "loss": 1.9794,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.01929260450160772,
27
+ "grad_norm": 0.7554563283920288,
28
+ "learning_rate": 8.999999999999999e-05,
29
+ "loss": 1.9687,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.02572347266881029,
34
+ "grad_norm": 0.701172411441803,
35
+ "learning_rate": 0.00011999999999999999,
36
+ "loss": 2.0374,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.03215434083601286,
41
+ "grad_norm": 0.7426002621650696,
42
+ "learning_rate": 0.00015,
43
+ "loss": 1.8484,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.03858520900321544,
48
+ "grad_norm": 0.7900332808494568,
49
+ "learning_rate": 0.00017999999999999998,
50
+ "loss": 1.91,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.04501607717041801,
55
+ "grad_norm": 0.7825136184692383,
56
+ "learning_rate": 0.00020999999999999998,
57
+ "loss": 1.9625,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.05144694533762058,
62
+ "grad_norm": 0.9338003993034363,
63
+ "learning_rate": 0.00023999999999999998,
64
+ "loss": 1.9668,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.05787781350482315,
69
+ "grad_norm": 0.8660485148429871,
70
+ "learning_rate": 0.00027,
71
+ "loss": 2.0447,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.06430868167202572,
76
+ "grad_norm": 0.8631746768951416,
77
+ "learning_rate": 0.0003,
78
+ "loss": 2.0347,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.0707395498392283,
83
+ "grad_norm": 0.9202760457992554,
84
+ "learning_rate": 0.00029934282584884994,
85
+ "loss": 2.0218,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.07717041800643087,
90
+ "grad_norm": 0.8508992791175842,
91
+ "learning_rate": 0.00029868565169769985,
92
+ "loss": 1.9808,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.08360128617363344,
97
+ "grad_norm": 0.9962050914764404,
98
+ "learning_rate": 0.0002980284775465498,
99
+ "loss": 1.9586,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.09003215434083602,
104
+ "grad_norm": 0.9159810543060303,
105
+ "learning_rate": 0.00029737130339539973,
106
+ "loss": 2.0257,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.09646302250803858,
111
+ "grad_norm": 0.8135138750076294,
112
+ "learning_rate": 0.0002967141292442497,
113
+ "loss": 2.0103,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.10289389067524116,
118
+ "grad_norm": 0.7933633327484131,
119
+ "learning_rate": 0.00029605695509309966,
120
+ "loss": 2.028,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.10932475884244373,
125
+ "grad_norm": 0.9258368611335754,
126
+ "learning_rate": 0.00029539978094194957,
127
+ "loss": 2.0654,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.1157556270096463,
132
+ "grad_norm": 0.8758969902992249,
133
+ "learning_rate": 0.00029474260679079954,
134
+ "loss": 1.9928,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 0.12218649517684887,
139
+ "grad_norm": 0.8316165804862976,
140
+ "learning_rate": 0.00029408543263964945,
141
+ "loss": 1.9748,
142
+ "step": 190
143
+ },
144
+ {
145
+ "epoch": 0.12861736334405144,
146
+ "grad_norm": 0.8353763222694397,
147
+ "learning_rate": 0.0002934282584884994,
148
+ "loss": 2.0167,
149
+ "step": 200
150
+ },
151
+ {
152
+ "epoch": 0.12861736334405144,
153
+ "eval_loss": 2.0699551105499268,
154
+ "eval_runtime": 131.8406,
155
+ "eval_samples_per_second": 15.17,
156
+ "eval_steps_per_second": 1.896,
157
+ "step": 200
158
+ },
159
+ {
160
+ "epoch": 0.13504823151125403,
161
+ "grad_norm": 0.8024882078170776,
162
+ "learning_rate": 0.0002927710843373494,
163
+ "loss": 2.1039,
164
+ "step": 210
165
+ },
166
+ {
167
+ "epoch": 0.1414790996784566,
168
+ "grad_norm": 0.861377477645874,
169
+ "learning_rate": 0.0002921139101861993,
170
+ "loss": 2.023,
171
+ "step": 220
172
+ },
173
+ {
174
+ "epoch": 0.14790996784565916,
175
+ "grad_norm": 0.8247071504592896,
176
+ "learning_rate": 0.00029145673603504926,
177
+ "loss": 1.9341,
178
+ "step": 230
179
+ },
180
+ {
181
+ "epoch": 0.15434083601286175,
182
+ "grad_norm": 0.8182681202888489,
183
+ "learning_rate": 0.0002907995618838992,
184
+ "loss": 2.0137,
185
+ "step": 240
186
+ },
187
+ {
188
+ "epoch": 0.1607717041800643,
189
+ "grad_norm": 0.8556217551231384,
190
+ "learning_rate": 0.00029014238773274913,
191
+ "loss": 2.0638,
192
+ "step": 250
193
+ },
194
+ {
195
+ "epoch": 0.16720257234726688,
196
+ "grad_norm": 0.7721512913703918,
197
+ "learning_rate": 0.0002894852135815991,
198
+ "loss": 2.0061,
199
+ "step": 260
200
+ },
201
+ {
202
+ "epoch": 0.17363344051446947,
203
+ "grad_norm": 0.7948784828186035,
204
+ "learning_rate": 0.000288828039430449,
205
+ "loss": 1.9751,
206
+ "step": 270
207
+ },
208
+ {
209
+ "epoch": 0.18006430868167203,
210
+ "grad_norm": 0.7582404613494873,
211
+ "learning_rate": 0.000288170865279299,
212
+ "loss": 2.0254,
213
+ "step": 280
214
+ },
215
+ {
216
+ "epoch": 0.1864951768488746,
217
+ "grad_norm": 0.9620535969734192,
218
+ "learning_rate": 0.00028751369112814894,
219
+ "loss": 1.9978,
220
+ "step": 290
221
+ },
222
+ {
223
+ "epoch": 0.19292604501607716,
224
+ "grad_norm": 0.7374221682548523,
225
+ "learning_rate": 0.00028685651697699885,
226
+ "loss": 2.0631,
227
+ "step": 300
228
+ },
229
+ {
230
+ "epoch": 0.19935691318327975,
231
+ "grad_norm": 0.794651210308075,
232
+ "learning_rate": 0.0002861993428258488,
233
+ "loss": 1.9507,
234
+ "step": 310
235
+ },
236
+ {
237
+ "epoch": 0.2057877813504823,
238
+ "grad_norm": 0.7450920939445496,
239
+ "learning_rate": 0.00028554216867469873,
240
+ "loss": 2.0363,
241
+ "step": 320
242
+ },
243
+ {
244
+ "epoch": 0.21221864951768488,
245
+ "grad_norm": 0.7574348449707031,
246
+ "learning_rate": 0.0002848849945235487,
247
+ "loss": 2.0508,
248
+ "step": 330
249
+ },
250
+ {
251
+ "epoch": 0.21864951768488747,
252
+ "grad_norm": 0.9118533134460449,
253
+ "learning_rate": 0.00028422782037239866,
254
+ "loss": 2.0118,
255
+ "step": 340
256
+ },
257
+ {
258
+ "epoch": 0.22508038585209003,
259
+ "grad_norm": 0.8136394023895264,
260
+ "learning_rate": 0.0002835706462212486,
261
+ "loss": 2.1211,
262
+ "step": 350
263
+ },
264
+ {
265
+ "epoch": 0.2315112540192926,
266
+ "grad_norm": 0.9099079966545105,
267
+ "learning_rate": 0.00028291347207009854,
268
+ "loss": 2.0346,
269
+ "step": 360
270
+ },
271
+ {
272
+ "epoch": 0.2379421221864952,
273
+ "grad_norm": 0.830896258354187,
274
+ "learning_rate": 0.0002822562979189485,
275
+ "loss": 2.0494,
276
+ "step": 370
277
+ },
278
+ {
279
+ "epoch": 0.24437299035369775,
280
+ "grad_norm": 0.789002001285553,
281
+ "learning_rate": 0.0002815991237677984,
282
+ "loss": 1.9791,
283
+ "step": 380
284
+ },
285
+ {
286
+ "epoch": 0.2508038585209003,
287
+ "grad_norm": 0.8194644451141357,
288
+ "learning_rate": 0.0002809419496166484,
289
+ "loss": 2.0106,
290
+ "step": 390
291
+ },
292
+ {
293
+ "epoch": 0.2572347266881029,
294
+ "grad_norm": 0.8226191401481628,
295
+ "learning_rate": 0.00028028477546549835,
296
+ "loss": 2.0268,
297
+ "step": 400
298
+ },
299
+ {
300
+ "epoch": 0.2572347266881029,
301
+ "eval_loss": 2.057727575302124,
302
+ "eval_runtime": 127.2637,
303
+ "eval_samples_per_second": 15.715,
304
+ "eval_steps_per_second": 1.964,
305
+ "step": 400
306
+ },
307
+ {
308
+ "epoch": 0.26366559485530544,
309
+ "grad_norm": 0.796454668045044,
310
+ "learning_rate": 0.00027962760131434826,
311
+ "loss": 2.0376,
312
+ "step": 410
313
+ },
314
+ {
315
+ "epoch": 0.27009646302250806,
316
+ "grad_norm": 0.8327352404594421,
317
+ "learning_rate": 0.0002789704271631982,
318
+ "loss": 2.0481,
319
+ "step": 420
320
+ },
321
+ {
322
+ "epoch": 0.2765273311897106,
323
+ "grad_norm": 0.8051420450210571,
324
+ "learning_rate": 0.0002783132530120482,
325
+ "loss": 1.99,
326
+ "step": 430
327
+ },
328
+ {
329
+ "epoch": 0.2829581993569132,
330
+ "grad_norm": 0.7519128322601318,
331
+ "learning_rate": 0.0002776560788608981,
332
+ "loss": 2.0339,
333
+ "step": 440
334
+ },
335
+ {
336
+ "epoch": 0.28938906752411575,
337
+ "grad_norm": 0.8251495957374573,
338
+ "learning_rate": 0.00027699890470974807,
339
+ "loss": 2.0289,
340
+ "step": 450
341
+ },
342
+ {
343
+ "epoch": 0.2958199356913183,
344
+ "grad_norm": 0.7058277130126953,
345
+ "learning_rate": 0.000276341730558598,
346
+ "loss": 2.0669,
347
+ "step": 460
348
+ },
349
+ {
350
+ "epoch": 0.3022508038585209,
351
+ "grad_norm": 0.8475114107131958,
352
+ "learning_rate": 0.00027568455640744795,
353
+ "loss": 2.0506,
354
+ "step": 470
355
+ },
356
+ {
357
+ "epoch": 0.3086816720257235,
358
+ "grad_norm": 0.7855744957923889,
359
+ "learning_rate": 0.0002750273822562979,
360
+ "loss": 1.97,
361
+ "step": 480
362
+ },
363
+ {
364
+ "epoch": 0.31511254019292606,
365
+ "grad_norm": 0.727988064289093,
366
+ "learning_rate": 0.0002743702081051478,
367
+ "loss": 2.0705,
368
+ "step": 490
369
+ },
370
+ {
371
+ "epoch": 0.3215434083601286,
372
+ "grad_norm": 0.7662935853004456,
373
+ "learning_rate": 0.0002737130339539978,
374
+ "loss": 1.9678,
375
+ "step": 500
376
+ },
377
+ {
378
+ "epoch": 0.3279742765273312,
379
+ "grad_norm": 0.9171555638313293,
380
+ "learning_rate": 0.00027305585980284776,
381
+ "loss": 1.9818,
382
+ "step": 510
383
+ },
384
+ {
385
+ "epoch": 0.33440514469453375,
386
+ "grad_norm": 0.7959179282188416,
387
+ "learning_rate": 0.00027239868565169767,
388
+ "loss": 2.0014,
389
+ "step": 520
390
+ },
391
+ {
392
+ "epoch": 0.3408360128617363,
393
+ "grad_norm": 0.9359775185585022,
394
+ "learning_rate": 0.00027174151150054763,
395
+ "loss": 2.0244,
396
+ "step": 530
397
+ },
398
+ {
399
+ "epoch": 0.34726688102893893,
400
+ "grad_norm": 0.7740966081619263,
401
+ "learning_rate": 0.0002710843373493976,
402
+ "loss": 2.0883,
403
+ "step": 540
404
+ },
405
+ {
406
+ "epoch": 0.3536977491961415,
407
+ "grad_norm": 0.868601381778717,
408
+ "learning_rate": 0.0002704271631982475,
409
+ "loss": 2.0226,
410
+ "step": 550
411
+ },
412
+ {
413
+ "epoch": 0.36012861736334406,
414
+ "grad_norm": 0.8721134662628174,
415
+ "learning_rate": 0.0002697699890470975,
416
+ "loss": 2.0965,
417
+ "step": 560
418
+ },
419
+ {
420
+ "epoch": 0.3665594855305466,
421
+ "grad_norm": 0.8080394268035889,
422
+ "learning_rate": 0.00026911281489594744,
423
+ "loss": 2.0082,
424
+ "step": 570
425
+ },
426
+ {
427
+ "epoch": 0.3729903536977492,
428
+ "grad_norm": 1.7169413566589355,
429
+ "learning_rate": 0.00026845564074479735,
430
+ "loss": 2.039,
431
+ "step": 580
432
+ },
433
+ {
434
+ "epoch": 0.37942122186495175,
435
+ "grad_norm": 0.8220880031585693,
436
+ "learning_rate": 0.0002677984665936473,
437
+ "loss": 2.0696,
438
+ "step": 590
439
+ },
440
+ {
441
+ "epoch": 0.3858520900321543,
442
+ "grad_norm": 0.7639694213867188,
443
+ "learning_rate": 0.00026714129244249723,
444
+ "loss": 2.0014,
445
+ "step": 600
446
+ },
447
+ {
448
+ "epoch": 0.3858520900321543,
449
+ "eval_loss": 2.0443177223205566,
450
+ "eval_runtime": 133.8726,
451
+ "eval_samples_per_second": 14.94,
452
+ "eval_steps_per_second": 1.867,
453
+ "step": 600
454
+ },
455
+ {
456
+ "epoch": 0.39228295819935693,
457
+ "grad_norm": 0.817965567111969,
458
+ "learning_rate": 0.0002664841182913472,
459
+ "loss": 2.0553,
460
+ "step": 610
461
+ },
462
+ {
463
+ "epoch": 0.3987138263665595,
464
+ "grad_norm": 0.871166467666626,
465
+ "learning_rate": 0.00026582694414019716,
466
+ "loss": 2.0027,
467
+ "step": 620
468
+ },
469
+ {
470
+ "epoch": 0.40514469453376206,
471
+ "grad_norm": 0.7483948469161987,
472
+ "learning_rate": 0.00026516976998904707,
473
+ "loss": 2.0355,
474
+ "step": 630
475
+ },
476
+ {
477
+ "epoch": 0.4115755627009646,
478
+ "grad_norm": 0.8223303556442261,
479
+ "learning_rate": 0.00026451259583789704,
480
+ "loss": 2.0076,
481
+ "step": 640
482
+ },
483
+ {
484
+ "epoch": 0.4180064308681672,
485
+ "grad_norm": 0.80986088514328,
486
+ "learning_rate": 0.00026385542168674695,
487
+ "loss": 2.0781,
488
+ "step": 650
489
+ },
490
+ {
491
+ "epoch": 0.42443729903536975,
492
+ "grad_norm": 0.7527362704277039,
493
+ "learning_rate": 0.0002631982475355969,
494
+ "loss": 1.9727,
495
+ "step": 660
496
+ },
497
+ {
498
+ "epoch": 0.43086816720257237,
499
+ "grad_norm": 0.7571489810943604,
500
+ "learning_rate": 0.0002625410733844469,
501
+ "loss": 2.0205,
502
+ "step": 670
503
+ },
504
+ {
505
+ "epoch": 0.43729903536977494,
506
+ "grad_norm": 0.7976600527763367,
507
+ "learning_rate": 0.0002618838992332968,
508
+ "loss": 2.0505,
509
+ "step": 680
510
+ },
511
+ {
512
+ "epoch": 0.4437299035369775,
513
+ "grad_norm": 0.8057394623756409,
514
+ "learning_rate": 0.00026122672508214676,
515
+ "loss": 2.0351,
516
+ "step": 690
517
+ },
518
+ {
519
+ "epoch": 0.45016077170418006,
520
+ "grad_norm": 0.8420009016990662,
521
+ "learning_rate": 0.0002605695509309967,
522
+ "loss": 1.9655,
523
+ "step": 700
524
+ },
525
+ {
526
+ "epoch": 0.4565916398713826,
527
+ "grad_norm": 0.853597104549408,
528
+ "learning_rate": 0.00025991237677984664,
529
+ "loss": 1.9939,
530
+ "step": 710
531
+ },
532
+ {
533
+ "epoch": 0.4630225080385852,
534
+ "grad_norm": 0.7588443160057068,
535
+ "learning_rate": 0.0002592552026286966,
536
+ "loss": 2.032,
537
+ "step": 720
538
+ },
539
+ {
540
+ "epoch": 0.4694533762057878,
541
+ "grad_norm": 0.8099080920219421,
542
+ "learning_rate": 0.0002585980284775465,
543
+ "loss": 1.9817,
544
+ "step": 730
545
+ },
546
+ {
547
+ "epoch": 0.4758842443729904,
548
+ "grad_norm": 0.7894070148468018,
549
+ "learning_rate": 0.0002579408543263965,
550
+ "loss": 2.0001,
551
+ "step": 740
552
+ },
553
+ {
554
+ "epoch": 0.48231511254019294,
555
+ "grad_norm": 0.7474116683006287,
556
+ "learning_rate": 0.00025728368017524644,
557
+ "loss": 2.0077,
558
+ "step": 750
559
+ },
560
+ {
561
+ "epoch": 0.4887459807073955,
562
+ "grad_norm": 0.8076878786087036,
563
+ "learning_rate": 0.00025662650602409636,
564
+ "loss": 2.0394,
565
+ "step": 760
566
+ },
567
+ {
568
+ "epoch": 0.49517684887459806,
569
+ "grad_norm": 0.7559667825698853,
570
+ "learning_rate": 0.0002559693318729463,
571
+ "loss": 1.9753,
572
+ "step": 770
573
+ },
574
+ {
575
+ "epoch": 0.5016077170418006,
576
+ "grad_norm": 0.7402215600013733,
577
+ "learning_rate": 0.00025531215772179623,
578
+ "loss": 2.0353,
579
+ "step": 780
580
+ },
581
+ {
582
+ "epoch": 0.5080385852090032,
583
+ "grad_norm": 0.7112523317337036,
584
+ "learning_rate": 0.0002546549835706462,
585
+ "loss": 1.989,
586
+ "step": 790
587
+ },
588
+ {
589
+ "epoch": 0.5144694533762058,
590
+ "grad_norm": 0.7255666255950928,
591
+ "learning_rate": 0.00025399780941949616,
592
+ "loss": 1.9912,
593
+ "step": 800
594
+ },
595
+ {
596
+ "epoch": 0.5144694533762058,
597
+ "eval_loss": 2.0358893871307373,
598
+ "eval_runtime": 131.9747,
599
+ "eval_samples_per_second": 15.154,
600
+ "eval_steps_per_second": 1.894,
601
+ "step": 800
602
+ },
603
+ {
604
+ "epoch": 0.5209003215434084,
605
+ "grad_norm": 0.7614848613739014,
606
+ "learning_rate": 0.0002533406352683461,
607
+ "loss": 1.9507,
608
+ "step": 810
609
+ },
610
+ {
611
+ "epoch": 0.5273311897106109,
612
+ "grad_norm": 0.7834282517433167,
613
+ "learning_rate": 0.00025268346111719604,
614
+ "loss": 2.0572,
615
+ "step": 820
616
+ },
617
+ {
618
+ "epoch": 0.5337620578778135,
619
+ "grad_norm": 0.8642615079879761,
620
+ "learning_rate": 0.00025202628696604595,
621
+ "loss": 1.9766,
622
+ "step": 830
623
+ },
624
+ {
625
+ "epoch": 0.5401929260450161,
626
+ "grad_norm": 0.7937222123146057,
627
+ "learning_rate": 0.0002513691128148959,
628
+ "loss": 1.9718,
629
+ "step": 840
630
+ },
631
+ {
632
+ "epoch": 0.5466237942122186,
633
+ "grad_norm": 0.7922580242156982,
634
+ "learning_rate": 0.0002507119386637459,
635
+ "loss": 2.0098,
636
+ "step": 850
637
+ },
638
+ {
639
+ "epoch": 0.5530546623794212,
640
+ "grad_norm": 0.7464605569839478,
641
+ "learning_rate": 0.0002500547645125958,
642
+ "loss": 1.9529,
643
+ "step": 860
644
+ },
645
+ {
646
+ "epoch": 0.5594855305466238,
647
+ "grad_norm": 0.7568275332450867,
648
+ "learning_rate": 0.00024939759036144576,
649
+ "loss": 1.989,
650
+ "step": 870
651
+ },
652
+ {
653
+ "epoch": 0.5659163987138264,
654
+ "grad_norm": 0.7011362910270691,
655
+ "learning_rate": 0.00024874041621029573,
656
+ "loss": 2.031,
657
+ "step": 880
658
+ },
659
+ {
660
+ "epoch": 0.572347266881029,
661
+ "grad_norm": 0.7106270790100098,
662
+ "learning_rate": 0.00024808324205914564,
663
+ "loss": 2.022,
664
+ "step": 890
665
+ },
666
+ {
667
+ "epoch": 0.5787781350482315,
668
+ "grad_norm": 0.7415210604667664,
669
+ "learning_rate": 0.0002474260679079956,
670
+ "loss": 2.0595,
671
+ "step": 900
672
+ },
673
+ {
674
+ "epoch": 0.5852090032154341,
675
+ "grad_norm": 0.7313567399978638,
676
+ "learning_rate": 0.0002467688937568455,
677
+ "loss": 2.0293,
678
+ "step": 910
679
+ },
680
+ {
681
+ "epoch": 0.5916398713826366,
682
+ "grad_norm": 0.692523181438446,
683
+ "learning_rate": 0.0002461117196056955,
684
+ "loss": 2.0746,
685
+ "step": 920
686
+ },
687
+ {
688
+ "epoch": 0.5980707395498392,
689
+ "grad_norm": 0.6929277181625366,
690
+ "learning_rate": 0.00024545454545454545,
691
+ "loss": 1.955,
692
+ "step": 930
693
+ },
694
+ {
695
+ "epoch": 0.6045016077170418,
696
+ "grad_norm": 0.7199161648750305,
697
+ "learning_rate": 0.00024479737130339536,
698
+ "loss": 2.0454,
699
+ "step": 940
700
+ },
701
+ {
702
+ "epoch": 0.6109324758842444,
703
+ "grad_norm": 0.767314076423645,
704
+ "learning_rate": 0.00024414019715224533,
705
+ "loss": 2.0428,
706
+ "step": 950
707
+ },
708
+ {
709
+ "epoch": 0.617363344051447,
710
+ "grad_norm": 0.8044443130493164,
711
+ "learning_rate": 0.00024348302300109526,
712
+ "loss": 1.9423,
713
+ "step": 960
714
+ },
715
+ {
716
+ "epoch": 0.6237942122186495,
717
+ "grad_norm": 0.702936589717865,
718
+ "learning_rate": 0.0002428258488499452,
719
+ "loss": 1.9271,
720
+ "step": 970
721
+ },
722
+ {
723
+ "epoch": 0.6302250803858521,
724
+ "grad_norm": 0.7394160032272339,
725
+ "learning_rate": 0.00024216867469879517,
726
+ "loss": 1.9674,
727
+ "step": 980
728
+ },
729
+ {
730
+ "epoch": 0.6366559485530546,
731
+ "grad_norm": 0.7981842160224915,
732
+ "learning_rate": 0.0002415115005476451,
733
+ "loss": 1.9932,
734
+ "step": 990
735
+ },
736
+ {
737
+ "epoch": 0.6430868167202572,
738
+ "grad_norm": 0.871896505355835,
739
+ "learning_rate": 0.00024085432639649505,
740
+ "loss": 2.0182,
741
+ "step": 1000
742
+ },
743
+ {
744
+ "epoch": 0.6430868167202572,
745
+ "eval_loss": 2.024224281311035,
746
+ "eval_runtime": 130.1041,
747
+ "eval_samples_per_second": 15.372,
748
+ "eval_steps_per_second": 1.922,
749
+ "step": 1000
750
+ },
751
+ {
752
+ "epoch": 0.6495176848874598,
753
+ "grad_norm": 0.7123499512672424,
754
+ "learning_rate": 0.00024019715224534498,
755
+ "loss": 2.0923,
756
+ "step": 1010
757
+ },
758
+ {
759
+ "epoch": 0.6559485530546624,
760
+ "grad_norm": 0.7226546406745911,
761
+ "learning_rate": 0.00023953997809419495,
762
+ "loss": 2.0035,
763
+ "step": 1020
764
+ },
765
+ {
766
+ "epoch": 0.662379421221865,
767
+ "grad_norm": 0.7627468109130859,
768
+ "learning_rate": 0.0002388828039430449,
769
+ "loss": 1.9667,
770
+ "step": 1030
771
+ },
772
+ {
773
+ "epoch": 0.6688102893890675,
774
+ "grad_norm": 0.8175467252731323,
775
+ "learning_rate": 0.00023822562979189483,
776
+ "loss": 1.948,
777
+ "step": 1040
778
+ },
779
+ {
780
+ "epoch": 0.6752411575562701,
781
+ "grad_norm": 0.690073549747467,
782
+ "learning_rate": 0.0002375684556407448,
783
+ "loss": 2.0498,
784
+ "step": 1050
785
+ },
786
+ {
787
+ "epoch": 0.6816720257234726,
788
+ "grad_norm": 0.9848446249961853,
789
+ "learning_rate": 0.0002369112814895947,
790
+ "loss": 1.9874,
791
+ "step": 1060
792
+ },
793
+ {
794
+ "epoch": 0.6881028938906752,
795
+ "grad_norm": 0.7157571315765381,
796
+ "learning_rate": 0.00023625410733844467,
797
+ "loss": 2.0488,
798
+ "step": 1070
799
+ },
800
+ {
801
+ "epoch": 0.6945337620578779,
802
+ "grad_norm": 0.8503302931785583,
803
+ "learning_rate": 0.00023559693318729464,
804
+ "loss": 1.9958,
805
+ "step": 1080
806
+ },
807
+ {
808
+ "epoch": 0.7009646302250804,
809
+ "grad_norm": 0.7864677906036377,
810
+ "learning_rate": 0.00023493975903614455,
811
+ "loss": 2.0212,
812
+ "step": 1090
813
+ },
814
+ {
815
+ "epoch": 0.707395498392283,
816
+ "grad_norm": 1.7837698459625244,
817
+ "learning_rate": 0.0002342825848849945,
818
+ "loss": 1.9828,
819
+ "step": 1100
820
+ },
821
+ {
822
+ "epoch": 0.7138263665594855,
823
+ "grad_norm": 0.7183972001075745,
824
+ "learning_rate": 0.00023362541073384445,
825
+ "loss": 2.0652,
826
+ "step": 1110
827
+ },
828
+ {
829
+ "epoch": 0.7202572347266881,
830
+ "grad_norm": 0.7377676963806152,
831
+ "learning_rate": 0.0002329682365826944,
832
+ "loss": 2.0123,
833
+ "step": 1120
834
+ },
835
+ {
836
+ "epoch": 0.7266881028938906,
837
+ "grad_norm": 0.7170071601867676,
838
+ "learning_rate": 0.00023231106243154436,
839
+ "loss": 1.9759,
840
+ "step": 1130
841
+ },
842
+ {
843
+ "epoch": 0.7331189710610932,
844
+ "grad_norm": 0.6442170143127441,
845
+ "learning_rate": 0.00023165388828039427,
846
+ "loss": 2.047,
847
+ "step": 1140
848
+ },
849
+ {
850
+ "epoch": 0.7395498392282959,
851
+ "grad_norm": 0.7356306910514832,
852
+ "learning_rate": 0.00023099671412924423,
853
+ "loss": 2.0438,
854
+ "step": 1150
855
+ },
856
+ {
857
+ "epoch": 0.7459807073954984,
858
+ "grad_norm": 0.7483031153678894,
859
+ "learning_rate": 0.0002303395399780942,
860
+ "loss": 2.0274,
861
+ "step": 1160
862
+ },
863
+ {
864
+ "epoch": 0.752411575562701,
865
+ "grad_norm": 0.7624642848968506,
866
+ "learning_rate": 0.0002296823658269441,
867
+ "loss": 1.9938,
868
+ "step": 1170
869
+ },
870
+ {
871
+ "epoch": 0.7588424437299035,
872
+ "grad_norm": 0.7435073256492615,
873
+ "learning_rate": 0.00022902519167579408,
874
+ "loss": 1.9848,
875
+ "step": 1180
876
+ },
877
+ {
878
+ "epoch": 0.7652733118971061,
879
+ "grad_norm": 0.7327163219451904,
880
+ "learning_rate": 0.000228368017524644,
881
+ "loss": 2.0286,
882
+ "step": 1190
883
+ },
884
+ {
885
+ "epoch": 0.7717041800643086,
886
+ "grad_norm": 0.8398700952529907,
887
+ "learning_rate": 0.00022771084337349395,
888
+ "loss": 1.999,
889
+ "step": 1200
890
+ },
891
+ {
892
+ "epoch": 0.7717041800643086,
893
+ "eval_loss": 2.0166773796081543,
894
+ "eval_runtime": 129.989,
895
+ "eval_samples_per_second": 15.386,
896
+ "eval_steps_per_second": 1.923,
897
+ "step": 1200
898
+ },
899
+ {
900
+ "epoch": 0.7781350482315113,
901
+ "grad_norm": 0.6727181673049927,
902
+ "learning_rate": 0.00022705366922234392,
903
+ "loss": 2.0044,
904
+ "step": 1210
905
+ },
906
+ {
907
+ "epoch": 0.7845659163987139,
908
+ "grad_norm": 0.8738404512405396,
909
+ "learning_rate": 0.00022639649507119383,
910
+ "loss": 2.0246,
911
+ "step": 1220
912
+ },
913
+ {
914
+ "epoch": 0.7909967845659164,
915
+ "grad_norm": 0.760010302066803,
916
+ "learning_rate": 0.0002257393209200438,
917
+ "loss": 2.0058,
918
+ "step": 1230
919
+ },
920
+ {
921
+ "epoch": 0.797427652733119,
922
+ "grad_norm": 0.701081395149231,
923
+ "learning_rate": 0.00022508214676889373,
924
+ "loss": 1.9974,
925
+ "step": 1240
926
+ },
927
+ {
928
+ "epoch": 0.8038585209003215,
929
+ "grad_norm": 0.7346913814544678,
930
+ "learning_rate": 0.00022442497261774367,
931
+ "loss": 2.0884,
932
+ "step": 1250
933
+ },
934
+ {
935
+ "epoch": 0.8102893890675241,
936
+ "grad_norm": 0.7433114647865295,
937
+ "learning_rate": 0.00022376779846659364,
938
+ "loss": 1.9927,
939
+ "step": 1260
940
+ },
941
+ {
942
+ "epoch": 0.8167202572347267,
943
+ "grad_norm": 0.7781444787979126,
944
+ "learning_rate": 0.00022311062431544358,
945
+ "loss": 2.001,
946
+ "step": 1270
947
+ },
948
+ {
949
+ "epoch": 0.8231511254019293,
950
+ "grad_norm": 0.7538995742797852,
951
+ "learning_rate": 0.00022245345016429352,
952
+ "loss": 1.9947,
953
+ "step": 1280
954
+ },
955
+ {
956
+ "epoch": 0.8295819935691319,
957
+ "grad_norm": 0.7132537961006165,
958
+ "learning_rate": 0.00022179627601314345,
959
+ "loss": 1.9781,
960
+ "step": 1290
961
+ },
962
+ {
963
+ "epoch": 0.8360128617363344,
964
+ "grad_norm": 0.7174340486526489,
965
+ "learning_rate": 0.0002211391018619934,
966
+ "loss": 1.9848,
967
+ "step": 1300
968
+ },
969
+ {
970
+ "epoch": 0.842443729903537,
971
+ "grad_norm": 0.7245258092880249,
972
+ "learning_rate": 0.00022048192771084336,
973
+ "loss": 2.005,
974
+ "step": 1310
975
+ },
976
+ {
977
+ "epoch": 0.8488745980707395,
978
+ "grad_norm": 0.667892336845398,
979
+ "learning_rate": 0.0002198247535596933,
980
+ "loss": 1.9939,
981
+ "step": 1320
982
+ },
983
+ {
984
+ "epoch": 0.8553054662379421,
985
+ "grad_norm": 0.7173146605491638,
986
+ "learning_rate": 0.00021916757940854324,
987
+ "loss": 2.0636,
988
+ "step": 1330
989
+ },
990
+ {
991
+ "epoch": 0.8617363344051447,
992
+ "grad_norm": 0.7765901684761047,
993
+ "learning_rate": 0.0002185104052573932,
994
+ "loss": 1.9966,
995
+ "step": 1340
996
+ },
997
+ {
998
+ "epoch": 0.8681672025723473,
999
+ "grad_norm": 0.7077351808547974,
1000
+ "learning_rate": 0.00021785323110624314,
1001
+ "loss": 2.0078,
1002
+ "step": 1350
1003
+ },
1004
+ {
1005
+ "epoch": 0.8745980707395499,
1006
+ "grad_norm": 0.736723780632019,
1007
+ "learning_rate": 0.00021719605695509308,
1008
+ "loss": 2.0292,
1009
+ "step": 1360
1010
+ },
1011
+ {
1012
+ "epoch": 0.8810289389067524,
1013
+ "grad_norm": 0.732185959815979,
1014
+ "learning_rate": 0.00021653888280394302,
1015
+ "loss": 2.0223,
1016
+ "step": 1370
1017
+ },
1018
+ {
1019
+ "epoch": 0.887459807073955,
1020
+ "grad_norm": 0.7002454400062561,
1021
+ "learning_rate": 0.00021588170865279298,
1022
+ "loss": 2.0068,
1023
+ "step": 1380
1024
+ },
1025
+ {
1026
+ "epoch": 0.8938906752411575,
1027
+ "grad_norm": 0.75859534740448,
1028
+ "learning_rate": 0.00021522453450164292,
1029
+ "loss": 1.9556,
1030
+ "step": 1390
1031
+ },
1032
+ {
1033
+ "epoch": 0.9003215434083601,
1034
+ "grad_norm": 0.7475289106369019,
1035
+ "learning_rate": 0.00021456736035049286,
1036
+ "loss": 1.9792,
1037
+ "step": 1400
1038
+ },
1039
+ {
1040
+ "epoch": 0.9003215434083601,
1041
+ "eval_loss": 2.0089023113250732,
1042
+ "eval_runtime": 130.0325,
1043
+ "eval_samples_per_second": 15.381,
1044
+ "eval_steps_per_second": 1.923,
1045
+ "step": 1400
1046
+ },
1047
+ {
1048
+ "epoch": 0.9067524115755627,
1049
+ "grad_norm": 0.7917546629905701,
1050
+ "learning_rate": 0.00021391018619934283,
1051
+ "loss": 1.9999,
1052
+ "step": 1410
1053
+ },
1054
+ {
1055
+ "epoch": 0.9131832797427653,
1056
+ "grad_norm": 0.7062447667121887,
1057
+ "learning_rate": 0.00021325301204819274,
1058
+ "loss": 1.9779,
1059
+ "step": 1420
1060
+ },
1061
+ {
1062
+ "epoch": 0.9196141479099679,
1063
+ "grad_norm": 0.6973288655281067,
1064
+ "learning_rate": 0.0002125958378970427,
1065
+ "loss": 2.0511,
1066
+ "step": 1430
1067
+ },
1068
+ {
1069
+ "epoch": 0.9260450160771704,
1070
+ "grad_norm": 0.7297340035438538,
1071
+ "learning_rate": 0.00021193866374589267,
1072
+ "loss": 1.9764,
1073
+ "step": 1440
1074
+ },
1075
+ {
1076
+ "epoch": 0.932475884244373,
1077
+ "grad_norm": 0.9256350994110107,
1078
+ "learning_rate": 0.00021128148959474258,
1079
+ "loss": 1.9559,
1080
+ "step": 1450
1081
+ },
1082
+ {
1083
+ "epoch": 0.9389067524115756,
1084
+ "grad_norm": 0.6994000673294067,
1085
+ "learning_rate": 0.00021062431544359255,
1086
+ "loss": 2.0152,
1087
+ "step": 1460
1088
+ },
1089
+ {
1090
+ "epoch": 0.9453376205787781,
1091
+ "grad_norm": 0.7412806749343872,
1092
+ "learning_rate": 0.00020996714129244246,
1093
+ "loss": 1.9494,
1094
+ "step": 1470
1095
+ },
1096
+ {
1097
+ "epoch": 0.9517684887459807,
1098
+ "grad_norm": 0.729680061340332,
1099
+ "learning_rate": 0.00020930996714129242,
1100
+ "loss": 2.0272,
1101
+ "step": 1480
1102
+ },
1103
+ {
1104
+ "epoch": 0.9581993569131833,
1105
+ "grad_norm": 0.7601342797279358,
1106
+ "learning_rate": 0.0002086527929901424,
1107
+ "loss": 1.9714,
1108
+ "step": 1490
1109
+ },
1110
+ {
1111
+ "epoch": 0.9646302250803859,
1112
+ "grad_norm": 0.6875161528587341,
1113
+ "learning_rate": 0.0002079956188389923,
1114
+ "loss": 1.993,
1115
+ "step": 1500
1116
+ },
1117
+ {
1118
+ "epoch": 0.9710610932475884,
1119
+ "grad_norm": 0.7520968317985535,
1120
+ "learning_rate": 0.00020733844468784227,
1121
+ "loss": 2.0471,
1122
+ "step": 1510
1123
+ },
1124
+ {
1125
+ "epoch": 0.977491961414791,
1126
+ "grad_norm": 0.8061411380767822,
1127
+ "learning_rate": 0.00020668127053669218,
1128
+ "loss": 2.0145,
1129
+ "step": 1520
1130
+ },
1131
+ {
1132
+ "epoch": 0.9839228295819936,
1133
+ "grad_norm": 0.7837228775024414,
1134
+ "learning_rate": 0.00020602409638554214,
1135
+ "loss": 1.9889,
1136
+ "step": 1530
1137
+ },
1138
+ {
1139
+ "epoch": 0.9903536977491961,
1140
+ "grad_norm": 0.744296133518219,
1141
+ "learning_rate": 0.0002053669222343921,
1142
+ "loss": 1.9834,
1143
+ "step": 1540
1144
+ },
1145
+ {
1146
+ "epoch": 0.9967845659163987,
1147
+ "grad_norm": 0.7137749791145325,
1148
+ "learning_rate": 0.00020470974808324202,
1149
+ "loss": 2.0582,
1150
+ "step": 1550
1151
+ },
1152
+ {
1153
+ "epoch": 1.0032154340836013,
1154
+ "grad_norm": 0.718320906162262,
1155
+ "learning_rate": 0.000204052573932092,
1156
+ "loss": 1.9576,
1157
+ "step": 1560
1158
+ },
1159
+ {
1160
+ "epoch": 1.0096463022508038,
1161
+ "grad_norm": 0.719998836517334,
1162
+ "learning_rate": 0.00020339539978094195,
1163
+ "loss": 1.9138,
1164
+ "step": 1570
1165
+ },
1166
+ {
1167
+ "epoch": 1.0160771704180065,
1168
+ "grad_norm": 0.7154316306114197,
1169
+ "learning_rate": 0.00020273822562979186,
1170
+ "loss": 1.875,
1171
+ "step": 1580
1172
+ },
1173
+ {
1174
+ "epoch": 1.022508038585209,
1175
+ "grad_norm": 0.6565534472465515,
1176
+ "learning_rate": 0.00020208105147864183,
1177
+ "loss": 1.9994,
1178
+ "step": 1590
1179
+ },
1180
+ {
1181
+ "epoch": 1.0289389067524115,
1182
+ "grad_norm": 0.7222368121147156,
1183
+ "learning_rate": 0.00020142387732749177,
1184
+ "loss": 1.9591,
1185
+ "step": 1600
1186
+ },
1187
+ {
1188
+ "epoch": 1.0289389067524115,
1189
+ "eval_loss": 2.002497673034668,
1190
+ "eval_runtime": 131.2869,
1191
+ "eval_samples_per_second": 15.234,
1192
+ "eval_steps_per_second": 1.904,
1193
+ "step": 1600
1194
+ },
1195
+ {
1196
+ "epoch": 1.0353697749196142,
1197
+ "grad_norm": 0.7213057279586792,
1198
+ "learning_rate": 0.0002007667031763417,
1199
+ "loss": 1.9464,
1200
+ "step": 1610
1201
+ },
1202
+ {
1203
+ "epoch": 1.0418006430868167,
1204
+ "grad_norm": 0.6436830163002014,
1205
+ "learning_rate": 0.00020010952902519167,
1206
+ "loss": 1.8951,
1207
+ "step": 1620
1208
+ },
1209
+ {
1210
+ "epoch": 1.0482315112540193,
1211
+ "grad_norm": 0.7160071134567261,
1212
+ "learning_rate": 0.00019945235487404158,
1213
+ "loss": 1.9062,
1214
+ "step": 1630
1215
+ },
1216
+ {
1217
+ "epoch": 1.0546623794212218,
1218
+ "grad_norm": 0.6585739850997925,
1219
+ "learning_rate": 0.00019879518072289155,
1220
+ "loss": 1.9514,
1221
+ "step": 1640
1222
+ },
1223
+ {
1224
+ "epoch": 1.0610932475884245,
1225
+ "grad_norm": 0.7445241808891296,
1226
+ "learning_rate": 0.0001981380065717415,
1227
+ "loss": 1.8301,
1228
+ "step": 1650
1229
+ },
1230
+ {
1231
+ "epoch": 1.067524115755627,
1232
+ "grad_norm": 0.6654142141342163,
1233
+ "learning_rate": 0.00019748083242059143,
1234
+ "loss": 1.9048,
1235
+ "step": 1660
1236
+ },
1237
+ {
1238
+ "epoch": 1.0739549839228295,
1239
+ "grad_norm": 0.7550114393234253,
1240
+ "learning_rate": 0.0001968236582694414,
1241
+ "loss": 1.9266,
1242
+ "step": 1670
1243
+ },
1244
+ {
1245
+ "epoch": 1.0803858520900322,
1246
+ "grad_norm": 0.7276896834373474,
1247
+ "learning_rate": 0.00019616648411829133,
1248
+ "loss": 1.8942,
1249
+ "step": 1680
1250
+ },
1251
+ {
1252
+ "epoch": 1.0868167202572347,
1253
+ "grad_norm": 0.7431575059890747,
1254
+ "learning_rate": 0.00019550930996714127,
1255
+ "loss": 1.9148,
1256
+ "step": 1690
1257
+ },
1258
+ {
1259
+ "epoch": 1.0932475884244373,
1260
+ "grad_norm": 0.74256831407547,
1261
+ "learning_rate": 0.0001948521358159912,
1262
+ "loss": 1.942,
1263
+ "step": 1700
1264
+ },
1265
+ {
1266
+ "epoch": 1.09967845659164,
1267
+ "grad_norm": 0.7295734286308289,
1268
+ "learning_rate": 0.00019419496166484117,
1269
+ "loss": 1.9331,
1270
+ "step": 1710
1271
+ },
1272
+ {
1273
+ "epoch": 1.1061093247588425,
1274
+ "grad_norm": 0.7749672532081604,
1275
+ "learning_rate": 0.0001935377875136911,
1276
+ "loss": 1.9373,
1277
+ "step": 1720
1278
+ },
1279
+ {
1280
+ "epoch": 1.112540192926045,
1281
+ "grad_norm": 0.6896611452102661,
1282
+ "learning_rate": 0.00019288061336254105,
1283
+ "loss": 1.8813,
1284
+ "step": 1730
1285
+ },
1286
+ {
1287
+ "epoch": 1.1189710610932475,
1288
+ "grad_norm": 0.7282217741012573,
1289
+ "learning_rate": 0.00019222343921139102,
1290
+ "loss": 1.9634,
1291
+ "step": 1740
1292
+ },
1293
+ {
1294
+ "epoch": 1.1254019292604502,
1295
+ "grad_norm": 0.7761743068695068,
1296
+ "learning_rate": 0.00019156626506024093,
1297
+ "loss": 1.8708,
1298
+ "step": 1750
1299
+ },
1300
+ {
1301
+ "epoch": 1.1318327974276527,
1302
+ "grad_norm": 0.7596757411956787,
1303
+ "learning_rate": 0.0001909090909090909,
1304
+ "loss": 1.9446,
1305
+ "step": 1760
1306
+ },
1307
+ {
1308
+ "epoch": 1.1382636655948553,
1309
+ "grad_norm": 0.7023797631263733,
1310
+ "learning_rate": 0.00019025191675794086,
1311
+ "loss": 1.8837,
1312
+ "step": 1770
1313
+ },
1314
+ {
1315
+ "epoch": 1.144694533762058,
1316
+ "grad_norm": 0.7191573977470398,
1317
+ "learning_rate": 0.00018959474260679077,
1318
+ "loss": 1.9141,
1319
+ "step": 1780
1320
+ },
1321
+ {
1322
+ "epoch": 1.1511254019292605,
1323
+ "grad_norm": 0.784885048866272,
1324
+ "learning_rate": 0.00018893756845564074,
1325
+ "loss": 1.9506,
1326
+ "step": 1790
1327
+ },
1328
+ {
1329
+ "epoch": 1.157556270096463,
1330
+ "grad_norm": 0.710903525352478,
1331
+ "learning_rate": 0.00018828039430449068,
1332
+ "loss": 1.9157,
1333
+ "step": 1800
1334
+ },
1335
+ {
1336
+ "epoch": 1.157556270096463,
1337
+ "eval_loss": 1.998835563659668,
1338
+ "eval_runtime": 121.0458,
1339
+ "eval_samples_per_second": 16.523,
1340
+ "eval_steps_per_second": 2.065,
1341
+ "step": 1800
1342
+ },
1343
+ {
1344
+ "epoch": 1.1639871382636655,
1345
+ "grad_norm": 0.7552351355552673,
1346
+ "learning_rate": 0.00018762322015334062,
1347
+ "loss": 1.9139,
1348
+ "step": 1810
1349
+ },
1350
+ {
1351
+ "epoch": 1.1704180064308682,
1352
+ "grad_norm": 0.7722271084785461,
1353
+ "learning_rate": 0.00018696604600219058,
1354
+ "loss": 1.863,
1355
+ "step": 1820
1356
+ },
1357
+ {
1358
+ "epoch": 1.1768488745980707,
1359
+ "grad_norm": 0.7195548415184021,
1360
+ "learning_rate": 0.0001863088718510405,
1361
+ "loss": 1.8697,
1362
+ "step": 1830
1363
+ },
1364
+ {
1365
+ "epoch": 1.1832797427652733,
1366
+ "grad_norm": 0.7423893809318542,
1367
+ "learning_rate": 0.00018565169769989046,
1368
+ "loss": 1.9772,
1369
+ "step": 1840
1370
+ },
1371
+ {
1372
+ "epoch": 1.189710610932476,
1373
+ "grad_norm": 0.7222315073013306,
1374
+ "learning_rate": 0.00018499452354874042,
1375
+ "loss": 1.9308,
1376
+ "step": 1850
1377
+ },
1378
+ {
1379
+ "epoch": 1.1961414790996785,
1380
+ "grad_norm": 0.6815035939216614,
1381
+ "learning_rate": 0.00018433734939759034,
1382
+ "loss": 1.9675,
1383
+ "step": 1860
1384
+ },
1385
+ {
1386
+ "epoch": 1.202572347266881,
1387
+ "grad_norm": 0.7621594071388245,
1388
+ "learning_rate": 0.0001836801752464403,
1389
+ "loss": 1.9295,
1390
+ "step": 1870
1391
+ },
1392
+ {
1393
+ "epoch": 1.2090032154340835,
1394
+ "grad_norm": 0.7405025959014893,
1395
+ "learning_rate": 0.0001830230010952902,
1396
+ "loss": 1.9088,
1397
+ "step": 1880
1398
+ },
1399
+ {
1400
+ "epoch": 1.2154340836012862,
1401
+ "grad_norm": 0.6729809641838074,
1402
+ "learning_rate": 0.00018236582694414018,
1403
+ "loss": 1.9446,
1404
+ "step": 1890
1405
+ },
1406
+ {
1407
+ "epoch": 1.2218649517684887,
1408
+ "grad_norm": 0.7389471530914307,
1409
+ "learning_rate": 0.00018170865279299014,
1410
+ "loss": 1.8841,
1411
+ "step": 1900
1412
+ },
1413
+ {
1414
+ "epoch": 1.2282958199356913,
1415
+ "grad_norm": 0.6453628540039062,
1416
+ "learning_rate": 0.00018105147864184006,
1417
+ "loss": 1.8661,
1418
+ "step": 1910
1419
+ },
1420
+ {
1421
+ "epoch": 1.234726688102894,
1422
+ "grad_norm": 0.6971079111099243,
1423
+ "learning_rate": 0.00018039430449069002,
1424
+ "loss": 1.9807,
1425
+ "step": 1920
1426
+ },
1427
+ {
1428
+ "epoch": 1.2411575562700965,
1429
+ "grad_norm": 0.7807840704917908,
1430
+ "learning_rate": 0.00017973713033953996,
1431
+ "loss": 1.9475,
1432
+ "step": 1930
1433
+ },
1434
+ {
1435
+ "epoch": 1.247588424437299,
1436
+ "grad_norm": 0.78909832239151,
1437
+ "learning_rate": 0.0001790799561883899,
1438
+ "loss": 1.8439,
1439
+ "step": 1940
1440
+ },
1441
+ {
1442
+ "epoch": 1.2540192926045015,
1443
+ "grad_norm": 0.7715321183204651,
1444
+ "learning_rate": 0.00017842278203723986,
1445
+ "loss": 1.9478,
1446
+ "step": 1950
1447
+ },
1448
+ {
1449
+ "epoch": 1.2604501607717042,
1450
+ "grad_norm": 0.7786479592323303,
1451
+ "learning_rate": 0.0001777656078860898,
1452
+ "loss": 1.8773,
1453
+ "step": 1960
1454
+ },
1455
+ {
1456
+ "epoch": 1.2668810289389068,
1457
+ "grad_norm": 0.6935726404190063,
1458
+ "learning_rate": 0.00017710843373493974,
1459
+ "loss": 1.94,
1460
+ "step": 1970
1461
+ },
1462
+ {
1463
+ "epoch": 1.2733118971061093,
1464
+ "grad_norm": 0.7824066877365112,
1465
+ "learning_rate": 0.00017645125958378968,
1466
+ "loss": 1.8996,
1467
+ "step": 1980
1468
+ },
1469
+ {
1470
+ "epoch": 1.279742765273312,
1471
+ "grad_norm": 0.7019379138946533,
1472
+ "learning_rate": 0.00017579408543263962,
1473
+ "loss": 1.9114,
1474
+ "step": 1990
1475
+ },
1476
+ {
1477
+ "epoch": 1.2861736334405145,
1478
+ "grad_norm": 0.8215466737747192,
1479
+ "learning_rate": 0.00017513691128148958,
1480
+ "loss": 1.8294,
1481
+ "step": 2000
1482
+ },
1483
+ {
1484
+ "epoch": 1.2861736334405145,
1485
+ "eval_loss": 1.9947528839111328,
1486
+ "eval_runtime": 132.3397,
1487
+ "eval_samples_per_second": 15.113,
1488
+ "eval_steps_per_second": 1.889,
1489
+ "step": 2000
1490
+ },
1491
+ {
1492
+ "epoch": 1.292604501607717,
1493
+ "grad_norm": 0.7088531851768494,
1494
+ "learning_rate": 0.00017447973713033952,
1495
+ "loss": 1.9497,
1496
+ "step": 2010
1497
+ },
1498
+ {
1499
+ "epoch": 1.2990353697749195,
1500
+ "grad_norm": 0.7754150032997131,
1501
+ "learning_rate": 0.00017382256297918946,
1502
+ "loss": 1.9047,
1503
+ "step": 2020
1504
+ },
1505
+ {
1506
+ "epoch": 1.3054662379421222,
1507
+ "grad_norm": 0.7185202836990356,
1508
+ "learning_rate": 0.00017316538882803943,
1509
+ "loss": 1.8529,
1510
+ "step": 2030
1511
+ },
1512
+ {
1513
+ "epoch": 1.3118971061093248,
1514
+ "grad_norm": 0.7496573328971863,
1515
+ "learning_rate": 0.00017250821467688937,
1516
+ "loss": 1.8618,
1517
+ "step": 2040
1518
+ },
1519
+ {
1520
+ "epoch": 1.3183279742765273,
1521
+ "grad_norm": 0.6794284582138062,
1522
+ "learning_rate": 0.0001718510405257393,
1523
+ "loss": 1.898,
1524
+ "step": 2050
1525
+ },
1526
+ {
1527
+ "epoch": 1.32475884244373,
1528
+ "grad_norm": 0.7059448957443237,
1529
+ "learning_rate": 0.00017119386637458924,
1530
+ "loss": 1.9594,
1531
+ "step": 2060
1532
+ },
1533
+ {
1534
+ "epoch": 1.3311897106109325,
1535
+ "grad_norm": 0.7007871866226196,
1536
+ "learning_rate": 0.0001705366922234392,
1537
+ "loss": 1.9476,
1538
+ "step": 2070
1539
+ },
1540
+ {
1541
+ "epoch": 1.337620578778135,
1542
+ "grad_norm": 0.6973986029624939,
1543
+ "learning_rate": 0.00016987951807228915,
1544
+ "loss": 1.9567,
1545
+ "step": 2080
1546
+ },
1547
+ {
1548
+ "epoch": 1.3440514469453375,
1549
+ "grad_norm": 0.7169969081878662,
1550
+ "learning_rate": 0.00016922234392113909,
1551
+ "loss": 1.9685,
1552
+ "step": 2090
1553
+ },
1554
+ {
1555
+ "epoch": 1.3504823151125402,
1556
+ "grad_norm": 0.7009272575378418,
1557
+ "learning_rate": 0.00016856516976998905,
1558
+ "loss": 1.9714,
1559
+ "step": 2100
1560
+ },
1561
+ {
1562
+ "epoch": 1.3569131832797428,
1563
+ "grad_norm": 0.7070193290710449,
1564
+ "learning_rate": 0.00016790799561883896,
1565
+ "loss": 1.9695,
1566
+ "step": 2110
1567
+ },
1568
+ {
1569
+ "epoch": 1.3633440514469453,
1570
+ "grad_norm": 0.7268947958946228,
1571
+ "learning_rate": 0.00016725082146768893,
1572
+ "loss": 1.9107,
1573
+ "step": 2120
1574
+ },
1575
+ {
1576
+ "epoch": 1.369774919614148,
1577
+ "grad_norm": 0.7544928789138794,
1578
+ "learning_rate": 0.00016659364731653887,
1579
+ "loss": 1.8658,
1580
+ "step": 2130
1581
+ },
1582
+ {
1583
+ "epoch": 1.3762057877813505,
1584
+ "grad_norm": 0.6320627927780151,
1585
+ "learning_rate": 0.0001659364731653888,
1586
+ "loss": 1.8917,
1587
+ "step": 2140
1588
+ },
1589
+ {
1590
+ "epoch": 1.382636655948553,
1591
+ "grad_norm": 0.6863923668861389,
1592
+ "learning_rate": 0.00016527929901423877,
1593
+ "loss": 1.9237,
1594
+ "step": 2150
1595
+ },
1596
+ {
1597
+ "epoch": 1.3890675241157555,
1598
+ "grad_norm": 0.7775669097900391,
1599
+ "learning_rate": 0.00016462212486308868,
1600
+ "loss": 1.8548,
1601
+ "step": 2160
1602
+ },
1603
+ {
1604
+ "epoch": 1.3954983922829582,
1605
+ "grad_norm": 0.7198719382286072,
1606
+ "learning_rate": 0.00016396495071193865,
1607
+ "loss": 1.9145,
1608
+ "step": 2170
1609
+ },
1610
+ {
1611
+ "epoch": 1.4019292604501608,
1612
+ "grad_norm": 0.7938317656517029,
1613
+ "learning_rate": 0.00016330777656078861,
1614
+ "loss": 1.8939,
1615
+ "step": 2180
1616
+ },
1617
+ {
1618
+ "epoch": 1.4083601286173635,
1619
+ "grad_norm": 0.7361711263656616,
1620
+ "learning_rate": 0.00016265060240963853,
1621
+ "loss": 1.9642,
1622
+ "step": 2190
1623
+ },
1624
+ {
1625
+ "epoch": 1.414790996784566,
1626
+ "grad_norm": 0.7385576963424683,
1627
+ "learning_rate": 0.0001619934282584885,
1628
+ "loss": 1.9134,
1629
+ "step": 2200
1630
+ },
1631
+ {
1632
+ "epoch": 1.414790996784566,
1633
+ "eval_loss": 1.9883830547332764,
1634
+ "eval_runtime": 130.0767,
1635
+ "eval_samples_per_second": 15.376,
1636
+ "eval_steps_per_second": 1.922,
1637
+ "step": 2200
1638
+ },
1639
+ {
1640
+ "epoch": 1.4212218649517685,
1641
+ "grad_norm": 0.7863461971282959,
1642
+ "learning_rate": 0.0001613362541073384,
1643
+ "loss": 2.0157,
1644
+ "step": 2210
1645
+ },
1646
+ {
1647
+ "epoch": 1.427652733118971,
1648
+ "grad_norm": 0.7755898237228394,
1649
+ "learning_rate": 0.00016067907995618837,
1650
+ "loss": 1.8973,
1651
+ "step": 2220
1652
+ },
1653
+ {
1654
+ "epoch": 1.4340836012861735,
1655
+ "grad_norm": 0.7090388536453247,
1656
+ "learning_rate": 0.00016002190580503833,
1657
+ "loss": 1.9034,
1658
+ "step": 2230
1659
+ },
1660
+ {
1661
+ "epoch": 1.4405144694533762,
1662
+ "grad_norm": 0.6487644910812378,
1663
+ "learning_rate": 0.00015936473165388825,
1664
+ "loss": 1.906,
1665
+ "step": 2240
1666
+ },
1667
+ {
1668
+ "epoch": 1.4469453376205788,
1669
+ "grad_norm": 0.6597898006439209,
1670
+ "learning_rate": 0.0001587075575027382,
1671
+ "loss": 1.843,
1672
+ "step": 2250
1673
+ },
1674
+ {
1675
+ "epoch": 1.4533762057877815,
1676
+ "grad_norm": 0.7069796323776245,
1677
+ "learning_rate": 0.00015805038335158818,
1678
+ "loss": 1.9554,
1679
+ "step": 2260
1680
+ },
1681
+ {
1682
+ "epoch": 1.459807073954984,
1683
+ "grad_norm": 0.7358680367469788,
1684
+ "learning_rate": 0.0001573932092004381,
1685
+ "loss": 1.9268,
1686
+ "step": 2270
1687
+ },
1688
+ {
1689
+ "epoch": 1.4662379421221865,
1690
+ "grad_norm": 0.675457775592804,
1691
+ "learning_rate": 0.00015673603504928806,
1692
+ "loss": 1.8981,
1693
+ "step": 2280
1694
+ },
1695
+ {
1696
+ "epoch": 1.472668810289389,
1697
+ "grad_norm": 0.7369397878646851,
1698
+ "learning_rate": 0.000156078860898138,
1699
+ "loss": 1.9535,
1700
+ "step": 2290
1701
+ },
1702
+ {
1703
+ "epoch": 1.4790996784565915,
1704
+ "grad_norm": 0.666994035243988,
1705
+ "learning_rate": 0.00015542168674698793,
1706
+ "loss": 1.8657,
1707
+ "step": 2300
1708
+ },
1709
+ {
1710
+ "epoch": 1.4855305466237942,
1711
+ "grad_norm": 0.7241340279579163,
1712
+ "learning_rate": 0.0001547645125958379,
1713
+ "loss": 1.8097,
1714
+ "step": 2310
1715
+ },
1716
+ {
1717
+ "epoch": 1.4919614147909968,
1718
+ "grad_norm": 0.7224936485290527,
1719
+ "learning_rate": 0.0001541073384446878,
1720
+ "loss": 1.8397,
1721
+ "step": 2320
1722
+ },
1723
+ {
1724
+ "epoch": 1.4983922829581995,
1725
+ "grad_norm": 0.7167637348175049,
1726
+ "learning_rate": 0.00015345016429353778,
1727
+ "loss": 1.9225,
1728
+ "step": 2330
1729
+ },
1730
+ {
1731
+ "epoch": 1.504823151125402,
1732
+ "grad_norm": 0.7176666259765625,
1733
+ "learning_rate": 0.00015279299014238771,
1734
+ "loss": 1.8764,
1735
+ "step": 2340
1736
+ },
1737
+ {
1738
+ "epoch": 1.5112540192926045,
1739
+ "grad_norm": 0.735252857208252,
1740
+ "learning_rate": 0.00015213581599123765,
1741
+ "loss": 1.8935,
1742
+ "step": 2350
1743
+ },
1744
+ {
1745
+ "epoch": 1.517684887459807,
1746
+ "grad_norm": 0.6805827021598816,
1747
+ "learning_rate": 0.00015147864184008762,
1748
+ "loss": 1.9212,
1749
+ "step": 2360
1750
+ },
1751
+ {
1752
+ "epoch": 1.5241157556270095,
1753
+ "grad_norm": 0.7019375562667847,
1754
+ "learning_rate": 0.00015082146768893756,
1755
+ "loss": 1.9318,
1756
+ "step": 2370
1757
+ },
1758
+ {
1759
+ "epoch": 1.5305466237942122,
1760
+ "grad_norm": 0.6795372366905212,
1761
+ "learning_rate": 0.0001501642935377875,
1762
+ "loss": 1.9023,
1763
+ "step": 2380
1764
+ },
1765
+ {
1766
+ "epoch": 1.5369774919614148,
1767
+ "grad_norm": 0.6497982144355774,
1768
+ "learning_rate": 0.00014950711938663743,
1769
+ "loss": 1.9721,
1770
+ "step": 2390
1771
+ },
1772
+ {
1773
+ "epoch": 1.5434083601286175,
1774
+ "grad_norm": 0.7713346481323242,
1775
+ "learning_rate": 0.0001488499452354874,
1776
+ "loss": 1.9906,
1777
+ "step": 2400
1778
+ },
1779
+ {
1780
+ "epoch": 1.5434083601286175,
1781
+ "eval_loss": 1.9822700023651123,
1782
+ "eval_runtime": 130.376,
1783
+ "eval_samples_per_second": 15.34,
1784
+ "eval_steps_per_second": 1.918,
1785
+ "step": 2400
1786
+ },
1787
+ {
1788
+ "epoch": 1.54983922829582,
1789
+ "grad_norm": 0.7202898263931274,
1790
+ "learning_rate": 0.00014819277108433734,
1791
+ "loss": 1.8816,
1792
+ "step": 2410
1793
+ },
1794
+ {
1795
+ "epoch": 1.5562700964630225,
1796
+ "grad_norm": 0.7167313694953918,
1797
+ "learning_rate": 0.00014753559693318728,
1798
+ "loss": 1.9316,
1799
+ "step": 2420
1800
+ },
1801
+ {
1802
+ "epoch": 1.562700964630225,
1803
+ "grad_norm": 0.7133712768554688,
1804
+ "learning_rate": 0.00014687842278203724,
1805
+ "loss": 2.0053,
1806
+ "step": 2430
1807
+ },
1808
+ {
1809
+ "epoch": 1.5691318327974275,
1810
+ "grad_norm": 0.76304692029953,
1811
+ "learning_rate": 0.00014622124863088718,
1812
+ "loss": 1.8718,
1813
+ "step": 2440
1814
+ },
1815
+ {
1816
+ "epoch": 1.5755627009646302,
1817
+ "grad_norm": 0.667654812335968,
1818
+ "learning_rate": 0.00014556407447973712,
1819
+ "loss": 1.8727,
1820
+ "step": 2450
1821
+ },
1822
+ {
1823
+ "epoch": 1.5819935691318328,
1824
+ "grad_norm": 0.7308873534202576,
1825
+ "learning_rate": 0.00014490690032858706,
1826
+ "loss": 1.8918,
1827
+ "step": 2460
1828
+ },
1829
+ {
1830
+ "epoch": 1.5884244372990355,
1831
+ "grad_norm": 0.9376251697540283,
1832
+ "learning_rate": 0.00014424972617743702,
1833
+ "loss": 1.96,
1834
+ "step": 2470
1835
+ },
1836
+ {
1837
+ "epoch": 1.594855305466238,
1838
+ "grad_norm": 0.6924982666969299,
1839
+ "learning_rate": 0.00014359255202628696,
1840
+ "loss": 1.8744,
1841
+ "step": 2480
1842
+ },
1843
+ {
1844
+ "epoch": 1.6012861736334405,
1845
+ "grad_norm": 0.7420899868011475,
1846
+ "learning_rate": 0.0001429353778751369,
1847
+ "loss": 1.9112,
1848
+ "step": 2490
1849
+ },
1850
+ {
1851
+ "epoch": 1.607717041800643,
1852
+ "grad_norm": 0.7384818196296692,
1853
+ "learning_rate": 0.00014227820372398684,
1854
+ "loss": 1.9562,
1855
+ "step": 2500
1856
+ },
1857
+ {
1858
+ "epoch": 1.6141479099678455,
1859
+ "grad_norm": 0.7550799250602722,
1860
+ "learning_rate": 0.0001416210295728368,
1861
+ "loss": 1.891,
1862
+ "step": 2510
1863
+ },
1864
+ {
1865
+ "epoch": 1.6205787781350482,
1866
+ "grad_norm": 0.7184371948242188,
1867
+ "learning_rate": 0.00014096385542168674,
1868
+ "loss": 1.9361,
1869
+ "step": 2520
1870
+ },
1871
+ {
1872
+ "epoch": 1.6270096463022508,
1873
+ "grad_norm": 0.770914614200592,
1874
+ "learning_rate": 0.00014030668127053668,
1875
+ "loss": 1.9132,
1876
+ "step": 2530
1877
+ },
1878
+ {
1879
+ "epoch": 1.6334405144694535,
1880
+ "grad_norm": 0.7566716074943542,
1881
+ "learning_rate": 0.00013964950711938662,
1882
+ "loss": 1.8982,
1883
+ "step": 2540
1884
+ },
1885
+ {
1886
+ "epoch": 1.639871382636656,
1887
+ "grad_norm": 0.6670147776603699,
1888
+ "learning_rate": 0.00013899233296823656,
1889
+ "loss": 1.9211,
1890
+ "step": 2550
1891
+ },
1892
+ {
1893
+ "epoch": 1.6463022508038585,
1894
+ "grad_norm": 0.7093060612678528,
1895
+ "learning_rate": 0.00013833515881708653,
1896
+ "loss": 1.8881,
1897
+ "step": 2560
1898
+ },
1899
+ {
1900
+ "epoch": 1.652733118971061,
1901
+ "grad_norm": 0.6549977660179138,
1902
+ "learning_rate": 0.00013767798466593646,
1903
+ "loss": 1.9187,
1904
+ "step": 2570
1905
+ },
1906
+ {
1907
+ "epoch": 1.6591639871382635,
1908
+ "grad_norm": 0.7039531469345093,
1909
+ "learning_rate": 0.0001370208105147864,
1910
+ "loss": 1.9165,
1911
+ "step": 2580
1912
+ },
1913
+ {
1914
+ "epoch": 1.6655948553054662,
1915
+ "grad_norm": 0.7216307520866394,
1916
+ "learning_rate": 0.00013636363636363634,
1917
+ "loss": 1.9228,
1918
+ "step": 2590
1919
+ },
1920
+ {
1921
+ "epoch": 1.6720257234726688,
1922
+ "grad_norm": 0.6866537928581238,
1923
+ "learning_rate": 0.00013570646221248628,
1924
+ "loss": 1.9003,
1925
+ "step": 2600
1926
+ },
1927
+ {
1928
+ "epoch": 1.6720257234726688,
1929
+ "eval_loss": 1.977206826210022,
1930
+ "eval_runtime": 131.9243,
1931
+ "eval_samples_per_second": 15.16,
1932
+ "eval_steps_per_second": 1.895,
1933
+ "step": 2600
1934
+ },
1935
+ {
1936
+ "epoch": 1.6784565916398715,
1937
+ "grad_norm": 0.7328875660896301,
1938
+ "learning_rate": 0.00013504928806133625,
1939
+ "loss": 1.9,
1940
+ "step": 2610
1941
+ },
1942
+ {
1943
+ "epoch": 1.684887459807074,
1944
+ "grad_norm": 0.7623500227928162,
1945
+ "learning_rate": 0.00013439211391018618,
1946
+ "loss": 1.9117,
1947
+ "step": 2620
1948
+ },
1949
+ {
1950
+ "epoch": 1.6913183279742765,
1951
+ "grad_norm": 0.6996557712554932,
1952
+ "learning_rate": 0.00013373493975903612,
1953
+ "loss": 1.8342,
1954
+ "step": 2630
1955
+ },
1956
+ {
1957
+ "epoch": 1.697749196141479,
1958
+ "grad_norm": 0.6597011685371399,
1959
+ "learning_rate": 0.00013307776560788606,
1960
+ "loss": 1.911,
1961
+ "step": 2640
1962
+ },
1963
+ {
1964
+ "epoch": 1.7041800643086815,
1965
+ "grad_norm": 0.7154627442359924,
1966
+ "learning_rate": 0.00013242059145673603,
1967
+ "loss": 1.8955,
1968
+ "step": 2650
1969
+ },
1970
+ {
1971
+ "epoch": 1.7106109324758842,
1972
+ "grad_norm": 0.6822642087936401,
1973
+ "learning_rate": 0.00013176341730558597,
1974
+ "loss": 1.928,
1975
+ "step": 2660
1976
+ },
1977
+ {
1978
+ "epoch": 1.717041800643087,
1979
+ "grad_norm": 0.6770340204238892,
1980
+ "learning_rate": 0.0001311062431544359,
1981
+ "loss": 1.934,
1982
+ "step": 2670
1983
+ },
1984
+ {
1985
+ "epoch": 1.7234726688102895,
1986
+ "grad_norm": 0.7235671877861023,
1987
+ "learning_rate": 0.00013044906900328584,
1988
+ "loss": 1.9248,
1989
+ "step": 2680
1990
+ },
1991
+ {
1992
+ "epoch": 1.729903536977492,
1993
+ "grad_norm": 0.6428620219230652,
1994
+ "learning_rate": 0.0001297918948521358,
1995
+ "loss": 1.8998,
1996
+ "step": 2690
1997
+ },
1998
+ {
1999
+ "epoch": 1.7363344051446945,
2000
+ "grad_norm": 0.7132564783096313,
2001
+ "learning_rate": 0.00012913472070098575,
2002
+ "loss": 1.9353,
2003
+ "step": 2700
2004
+ },
2005
+ {
2006
+ "epoch": 1.742765273311897,
2007
+ "grad_norm": 0.7110019326210022,
2008
+ "learning_rate": 0.0001284775465498357,
2009
+ "loss": 1.8877,
2010
+ "step": 2710
2011
+ },
2012
+ {
2013
+ "epoch": 1.7491961414790995,
2014
+ "grad_norm": 0.7546197772026062,
2015
+ "learning_rate": 0.00012782037239868565,
2016
+ "loss": 1.9219,
2017
+ "step": 2720
2018
+ },
2019
+ {
2020
+ "epoch": 1.7556270096463023,
2021
+ "grad_norm": 0.8485615253448486,
2022
+ "learning_rate": 0.0001271631982475356,
2023
+ "loss": 1.9238,
2024
+ "step": 2730
2025
+ },
2026
+ {
2027
+ "epoch": 1.762057877813505,
2028
+ "grad_norm": 0.7058401703834534,
2029
+ "learning_rate": 0.00012650602409638553,
2030
+ "loss": 1.9012,
2031
+ "step": 2740
2032
+ },
2033
+ {
2034
+ "epoch": 1.7684887459807075,
2035
+ "grad_norm": 0.7222112417221069,
2036
+ "learning_rate": 0.00012584884994523547,
2037
+ "loss": 1.8442,
2038
+ "step": 2750
2039
+ },
2040
+ {
2041
+ "epoch": 1.77491961414791,
2042
+ "grad_norm": 0.7010639905929565,
2043
+ "learning_rate": 0.00012519167579408543,
2044
+ "loss": 1.9322,
2045
+ "step": 2760
2046
+ },
2047
+ {
2048
+ "epoch": 1.7813504823151125,
2049
+ "grad_norm": 0.6908234357833862,
2050
+ "learning_rate": 0.00012453450164293537,
2051
+ "loss": 1.9456,
2052
+ "step": 2770
2053
+ },
2054
+ {
2055
+ "epoch": 1.787781350482315,
2056
+ "grad_norm": 0.6615903973579407,
2057
+ "learning_rate": 0.0001238773274917853,
2058
+ "loss": 1.9052,
2059
+ "step": 2780
2060
+ },
2061
+ {
2062
+ "epoch": 1.7942122186495175,
2063
+ "grad_norm": 0.6688089370727539,
2064
+ "learning_rate": 0.00012322015334063528,
2065
+ "loss": 1.87,
2066
+ "step": 2790
2067
+ },
2068
+ {
2069
+ "epoch": 1.8006430868167203,
2070
+ "grad_norm": 0.7396994233131409,
2071
+ "learning_rate": 0.00012256297918948522,
2072
+ "loss": 1.9243,
2073
+ "step": 2800
2074
+ },
2075
+ {
2076
+ "epoch": 1.8006430868167203,
2077
+ "eval_loss": 1.974278450012207,
2078
+ "eval_runtime": 144.2243,
2079
+ "eval_samples_per_second": 13.867,
2080
+ "eval_steps_per_second": 1.733,
2081
+ "step": 2800
2082
+ },
2083
+ {
2084
+ "epoch": 1.807073954983923,
2085
+ "grad_norm": 0.6520466208457947,
2086
+ "learning_rate": 0.00012190580503833514,
2087
+ "loss": 1.902,
2088
+ "step": 2810
2089
+ },
2090
+ {
2091
+ "epoch": 1.8135048231511255,
2092
+ "grad_norm": 0.7591603398323059,
2093
+ "learning_rate": 0.00012124863088718509,
2094
+ "loss": 1.9079,
2095
+ "step": 2820
2096
+ },
2097
+ {
2098
+ "epoch": 1.819935691318328,
2099
+ "grad_norm": 0.6622514128684998,
2100
+ "learning_rate": 0.00012059145673603504,
2101
+ "loss": 1.9288,
2102
+ "step": 2830
2103
+ },
2104
+ {
2105
+ "epoch": 1.8263665594855305,
2106
+ "grad_norm": 0.7578607797622681,
2107
+ "learning_rate": 0.00011993428258488498,
2108
+ "loss": 1.8936,
2109
+ "step": 2840
2110
+ },
2111
+ {
2112
+ "epoch": 1.832797427652733,
2113
+ "grad_norm": 0.730093240737915,
2114
+ "learning_rate": 0.00011927710843373494,
2115
+ "loss": 1.8809,
2116
+ "step": 2850
2117
+ },
2118
+ {
2119
+ "epoch": 1.8392282958199357,
2120
+ "grad_norm": 0.6403250098228455,
2121
+ "learning_rate": 0.00011861993428258487,
2122
+ "loss": 1.8866,
2123
+ "step": 2860
2124
+ },
2125
+ {
2126
+ "epoch": 1.8456591639871383,
2127
+ "grad_norm": 0.7032350897789001,
2128
+ "learning_rate": 0.00011796276013143481,
2129
+ "loss": 1.938,
2130
+ "step": 2870
2131
+ },
2132
+ {
2133
+ "epoch": 1.852090032154341,
2134
+ "grad_norm": 0.7376342415809631,
2135
+ "learning_rate": 0.00011730558598028478,
2136
+ "loss": 1.8925,
2137
+ "step": 2880
2138
+ },
2139
+ {
2140
+ "epoch": 1.8585209003215435,
2141
+ "grad_norm": 0.7093110680580139,
2142
+ "learning_rate": 0.00011664841182913472,
2143
+ "loss": 1.9029,
2144
+ "step": 2890
2145
+ },
2146
+ {
2147
+ "epoch": 1.864951768488746,
2148
+ "grad_norm": 0.6826250553131104,
2149
+ "learning_rate": 0.00011599123767798466,
2150
+ "loss": 1.8956,
2151
+ "step": 2900
2152
+ },
2153
+ {
2154
+ "epoch": 1.8713826366559485,
2155
+ "grad_norm": 0.7709969282150269,
2156
+ "learning_rate": 0.0001153340635268346,
2157
+ "loss": 1.92,
2158
+ "step": 2910
2159
+ },
2160
+ {
2161
+ "epoch": 1.877813504823151,
2162
+ "grad_norm": 0.6641222238540649,
2163
+ "learning_rate": 0.00011467688937568453,
2164
+ "loss": 1.8998,
2165
+ "step": 2920
2166
+ },
2167
+ {
2168
+ "epoch": 1.8842443729903537,
2169
+ "grad_norm": 0.7321887612342834,
2170
+ "learning_rate": 0.0001140197152245345,
2171
+ "loss": 1.9257,
2172
+ "step": 2930
2173
+ },
2174
+ {
2175
+ "epoch": 1.8906752411575563,
2176
+ "grad_norm": 0.7000001668930054,
2177
+ "learning_rate": 0.00011336254107338444,
2178
+ "loss": 1.8944,
2179
+ "step": 2940
2180
+ },
2181
+ {
2182
+ "epoch": 1.897106109324759,
2183
+ "grad_norm": 0.7347818613052368,
2184
+ "learning_rate": 0.00011270536692223438,
2185
+ "loss": 1.9256,
2186
+ "step": 2950
2187
+ },
2188
+ {
2189
+ "epoch": 1.9035369774919615,
2190
+ "grad_norm": 0.708888590335846,
2191
+ "learning_rate": 0.00011204819277108433,
2192
+ "loss": 1.9307,
2193
+ "step": 2960
2194
+ },
2195
+ {
2196
+ "epoch": 1.909967845659164,
2197
+ "grad_norm": 0.6980915665626526,
2198
+ "learning_rate": 0.00011139101861993428,
2199
+ "loss": 1.883,
2200
+ "step": 2970
2201
+ },
2202
+ {
2203
+ "epoch": 1.9163987138263665,
2204
+ "grad_norm": 0.8052535653114319,
2205
+ "learning_rate": 0.00011073384446878422,
2206
+ "loss": 1.899,
2207
+ "step": 2980
2208
+ },
2209
+ {
2210
+ "epoch": 1.922829581993569,
2211
+ "grad_norm": 0.707011878490448,
2212
+ "learning_rate": 0.00011007667031763416,
2213
+ "loss": 1.9263,
2214
+ "step": 2990
2215
+ },
2216
+ {
2217
+ "epoch": 1.9292604501607717,
2218
+ "grad_norm": 0.7086938619613647,
2219
+ "learning_rate": 0.00010941949616648411,
2220
+ "loss": 1.883,
2221
+ "step": 3000
2222
+ },
2223
+ {
2224
+ "epoch": 1.9292604501607717,
2225
+ "eval_loss": 1.9664931297302246,
2226
+ "eval_runtime": 133.023,
2227
+ "eval_samples_per_second": 15.035,
2228
+ "eval_steps_per_second": 1.879,
2229
+ "step": 3000
2230
+ }
2231
+ ],
2232
+ "logging_steps": 10,
2233
+ "max_steps": 4665,
2234
+ "num_input_tokens_seen": 0,
2235
+ "num_train_epochs": 3,
2236
+ "save_steps": 200,
2237
+ "stateful_callbacks": {
2238
+ "TrainerControl": {
2239
+ "args": {
2240
+ "should_epoch_stop": false,
2241
+ "should_evaluate": false,
2242
+ "should_log": false,
2243
+ "should_save": true,
2244
+ "should_training_stop": false
2245
+ },
2246
+ "attributes": {}
2247
+ }
2248
+ },
2249
+ "total_flos": 3.0137669676957696e+17,
2250
+ "train_batch_size": 16,
2251
+ "trial_name": null,
2252
+ "trial_params": null
2253
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cab22ba79cda15f54ce907097c40aecb6ebd4c038c6657764e0d5bf9d78a133c
3
+ size 5048