Alex Zhuang commited on
Commit
f8db401
1 Parent(s): 15d8d07
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - TIGER-Lab/SKGInstruct
5
+ language:
6
+ - en
7
+ ---
8
+ # 🏗️ StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
9
+
10
+ <span style="color:red">This checkpoing seems to have some issue, please use https://huggingface.co/TIGER-Lab/StructLM-7B-Mistral instead.</span>
11
+
12
+ Project Page: [https://tiger-ai-lab.github.io/StructLM/](https://tiger-ai-lab.github.io/StructLM/)
13
+
14
+ Paper: [https://arxiv.org/pdf/2402.16671.pdf](https://arxiv.org/pdf/2402.16671.pdf)
15
+
16
+ Code: [https://github.com/TIGER-AI-Lab/StructLM](https://github.com/TIGER-AI-Lab/StructLM)
17
+
18
+
19
+ ![Alt text](https://raw.githubusercontent.com/TIGER-AI-Lab/StructLM/gh-pages/static/images/thumbnail.drawio.png)
20
+
21
+ ## Introduction
22
+ StructLM, is a series of open-source large language models (LLMs) finetuned for structured knowledge grounding (SKG) tasks. We release 3 models:
23
+
24
+ 7B | [StructLM-7B](https://huggingface.co/TIGER-Lab/StructLM-7B)
25
+
26
+ 13B | [StructLM-13B](https://huggingface.co/TIGER-Lab/StructLM-13B)
27
+
28
+ 34B | [StructLM-34B](https://huggingface.co/TIGER-Lab/StructLM-34B)
29
+
30
+
31
+ ## Training Data
32
+ These models are trained on 🤗 [SKGInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/SKGInstruct), an instruction-tuning dataset containing mixture of 19 SKG tasks combined with 🤗 [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca). Check out the dataset card for more details.
33
+
34
+
35
+ ## Training Procedure
36
+ The models are fine-tuned with CodeLlama-Instruct-hf models as base models. Each model is trained for 3 epochs, and the best checkpoint is selected.
37
+
38
+ ## Evaluation
39
+ Here are a subset of model evaluation results:
40
+
41
+ ### Held in
42
+
43
+ | **Model** | **ToTTo** | **GrailQA** | **CompWebQ** | **MMQA** | **Feverous** | **Spider** | **TabFact** | **Dart** |
44
+ |-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|
45
+ | **StructLM-7B** | 49.4 | 80.4 | 78.3 | 85.2 | 84.4 | 72.4 | 80.8 | 62.2 |
46
+ | **StructLM-13B** | 49.3 | 79.2 | 80.4 | 86.0 | 85.0 | 74.1 | 84.7 | 61.4 |
47
+ | **StructLM-34B** | 50.2 | 82.2 | 81.9 | 88.1 | 85.7 | 74.6 | 86.6 | 61.8 |
48
+
49
+
50
+ ### Held out
51
+ | **Model** | **BIRD** | **InfoTabs** | **FinQA** | **SQA** |
52
+ |-----------------------|--------------|----------|----------|----------|
53
+ | **StructLM-7B** | 22.3 | 55.3 | 27.3 | 49.7 |
54
+ | **StructLM-13B** | 22.8 | 58.1 | 25.6 | 36.1 |
55
+ | **StructLM-34B** | 24.7 | 61.8 | 36.2 | 44.2 |
56
+
57
+
58
+ ## Usage
59
+ You can use the models through Huggingface's Transformers library.
60
+ Check our Github repo for the evaluation code: [https://github.com/TIGER-AI-Lab/StructLM](https://github.com/TIGER-AI-Lab/StructLM)
61
+
62
+
63
+ ## Prompt Format
64
+
65
+ **For this 7B model, the prompt format (different from 13B, 34B) is**
66
+ ```
67
+ [INST] <<SYS>>
68
+ You are an AI assistant that specializes in analyzing and reasoning over structured information. You will be given a task, optionally with some structured knowledge input. Your answer must strictly adhere to the output format, if specified.
69
+ <</SYS>>
70
+ {instruction} [/INST]
71
+ ```
72
+
73
+ To see concrete examples of this linearization, you can directly reference the 🤗 [SKGInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/SKGInstruct) (coming soon).
74
+ We will provide code for linearizing this data shortly.
75
+
76
+
77
+ A few examples:
78
+
79
+ **Tabular data**
80
+ ```
81
+ col : day | kilometers row 1 : tuesday | 0 row 2 : wednesday | 0 row 3 : thursday | 4 row 4 : friday | 0 row 5 : saturday | 0
82
+ ```
83
+
84
+ **Knowledge triples (dart)**
85
+ ```
86
+ Hawaii Five-O : notes : Episode: The Flight of the Jewels | [TABLECONTEXT] : [title] : Jeff Daniels | [TABLECONTEXT] : title : Hawaii Five-O
87
+ ```
88
+
89
+ **Knowledge graph schema (grailqa)**
90
+ ```
91
+ top antiquark: m.094nrqp | physics.particle_antiparticle.self_antiparticle physics.particle_family physics.particle.antiparticle physics.particle_family.subclasses physics.subatomic_particle_generation physics.particle_family.particles physics.particle common.image.appears_in_topic_gallery physics.subatomic_particle_generation.particles physics.particle.family physics.particle_family.parent_class physics.particle_antiparticle physics.particle_antiparticle.particle physics.particle.generation
92
+ ```
93
+
94
+ **Example input**
95
+
96
+ ```
97
+ [INST] <<SYS>>
98
+ You are an AI assistant that specializes in analyzing and reasoning over structured information. You will be given a task, optionally with some structured knowledge input. Your answer must strictly adhere to the output format, if specified.
99
+ <</SYS>>
100
+
101
+ Use the information in the following table to solve the problem, choose between the choices if they are provided. table:
102
+
103
+ col : day | kilometers row 1 : tuesday | 0 row 2 : wednesday | 0 row 3 : thursday | 4 row 4 : friday | 0 row 5 : saturday | 0
104
+
105
+
106
+ question:
107
+
108
+ Allie kept track of how many kilometers she walked during the past 5 days. What is the range of the numbers? [/INST]
109
+ ```
110
+
111
+
112
+ ## Intended Uses
113
+ These models are trained for research purposes. They are designed to be proficient in interpreting linearized structured input. Downstream uses can potentially include various applications requiring the interpretation of structured data.
114
+
115
+ ## Limitations
116
+ While we've tried to build an SKG-specialized model capable of generalizing, we have shown that this is a challenging domain, and it may lack performance characteristics that allow it to be directly used in chat or other applications.
117
+
118
+
119
+ ## Citation
120
+ If you use the models, data, or code from this project, please cite the original paper:
121
+
122
+ ```
123
+ @misc{zhuang2024structlm,
124
+ title={StructLM: Towards Building Generalist Models for Structured Knowledge Grounding},
125
+ author={Alex Zhuang and Ge Zhang and Tianyu Zheng and Xinrun Du and Junjie Wang and Weiming Ren and Stephen W. Huang and Jie Fu and Xiang Yue and Wenhu Chen},
126
+ year={2024},
127
+ eprint={2402.16671},
128
+ archivePrefix={arXiv},
129
+ primaryClass={cs.CL}
130
+ }
131
+ ```
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:106f2be18cffc71d94bd3723e3db3c119f1c690a5e5653dbc5decaf52e145107
3
  size 4939116424
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3a88670dc47a655be9bec1230bbcf936ac900ad5786797fd731b30789c2bfd3
3
  size 4939116424
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5ee55e66dfd0b59b06ebbae040671aa04c6564758dc36b6cf5aec9ef9aa66be5
3
  size 4947390880
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff1afbfb752bd0bf541812a6933918f2dc353e0fb78245141ad961dae4759ad3
3
  size 4947390880
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b411ec4a54fe493464d74f8a3484eed5c8eea732a96ba98e6a1fc848f2d5ccc4
3
  size 3590619888
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d646f796359e02f0e0f7496b204a909d37a9dcd425c85a3fa792a5291bc98bcf
3
  size 3590619888
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 2.9990309034133737,
5
  "eval_steps": 500,
6
- "global_step": 6963,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -3157,1728 +3157,6 @@
3157
  "learning_rate": 6.097595982676103e-06,
3158
  "loss": 0.5065,
3159
  "step": 4500
3160
- },
3161
- {
3162
- "epoch": 1.9425002691934963,
3163
- "grad_norm": 0.5176106095314026,
3164
- "learning_rate": 6.053907100217648e-06,
3165
- "loss": 0.5155,
3166
- "step": 4510
3167
- },
3168
- {
3169
- "epoch": 1.9468073651340583,
3170
- "grad_norm": 0.4772352874279022,
3171
- "learning_rate": 6.010307248062514e-06,
3172
- "loss": 0.5056,
3173
- "step": 4520
3174
- },
3175
- {
3176
- "epoch": 1.9511144610746203,
3177
- "grad_norm": 0.5366437435150146,
3178
- "learning_rate": 5.966797409894607e-06,
3179
- "loss": 0.4888,
3180
- "step": 4530
3181
- },
3182
- {
3183
- "epoch": 1.9554215570151825,
3184
- "grad_norm": 0.4917809069156647,
3185
- "learning_rate": 5.923378567366956e-06,
3186
- "loss": 0.5221,
3187
- "step": 4540
3188
- },
3189
- {
3190
- "epoch": 1.9597286529557447,
3191
- "grad_norm": 0.5597509741783142,
3192
- "learning_rate": 5.880051700079596e-06,
3193
- "loss": 0.5225,
3194
- "step": 4550
3195
- },
3196
- {
3197
- "epoch": 1.9640357488963067,
3198
- "grad_norm": 0.5258151888847351,
3199
- "learning_rate": 5.836817785557448e-06,
3200
- "loss": 0.5031,
3201
- "step": 4560
3202
- },
3203
- {
3204
- "epoch": 1.9683428448368687,
3205
- "grad_norm": 0.5679864287376404,
3206
- "learning_rate": 5.7936777992282565e-06,
3207
- "loss": 0.5074,
3208
- "step": 4570
3209
- },
3210
- {
3211
- "epoch": 1.9726499407774307,
3212
- "grad_norm": 0.5309889912605286,
3213
- "learning_rate": 5.750632714400607e-06,
3214
- "loss": 0.521,
3215
- "step": 4580
3216
- },
3217
- {
3218
- "epoch": 1.976957036717993,
3219
- "grad_norm": 0.5293132662773132,
3220
- "learning_rate": 5.707683502241936e-06,
3221
- "loss": 0.5133,
3222
- "step": 4590
3223
- },
3224
- {
3225
- "epoch": 1.981264132658555,
3226
- "grad_norm": 0.5223381519317627,
3227
- "learning_rate": 5.664831131756652e-06,
3228
- "loss": 0.5129,
3229
- "step": 4600
3230
- },
3231
- {
3232
- "epoch": 1.9855712285991172,
3233
- "grad_norm": 0.5365522503852844,
3234
- "learning_rate": 5.622076569764247e-06,
3235
- "loss": 0.504,
3236
- "step": 4610
3237
- },
3238
- {
3239
- "epoch": 1.9898783245396792,
3240
- "grad_norm": 0.5084212422370911,
3241
- "learning_rate": 5.5794207808774904e-06,
3242
- "loss": 0.488,
3243
- "step": 4620
3244
- },
3245
- {
3246
- "epoch": 1.9941854204802412,
3247
- "grad_norm": 0.4913804531097412,
3248
- "learning_rate": 5.536864727480683e-06,
3249
- "loss": 0.5098,
3250
- "step": 4630
3251
- },
3252
- {
3253
- "epoch": 1.9984925164208032,
3254
- "grad_norm": 0.5197212100028992,
3255
- "learning_rate": 5.4944093697079136e-06,
3256
- "loss": 0.5066,
3257
- "step": 4640
3258
- },
3259
- {
3260
- "epoch": 2.002799612361365,
3261
- "grad_norm": 0.51143479347229,
3262
- "learning_rate": 5.45205566542143e-06,
3263
- "loss": 0.4521,
3264
- "step": 4650
3265
- },
3266
- {
3267
- "epoch": 2.0071067083019276,
3268
- "grad_norm": 0.5107315182685852,
3269
- "learning_rate": 5.4098045701899934e-06,
3270
- "loss": 0.3968,
3271
- "step": 4660
3272
- },
3273
- {
3274
- "epoch": 2.0114138042424896,
3275
- "grad_norm": 0.5407351851463318,
3276
- "learning_rate": 5.367657037267354e-06,
3277
- "loss": 0.3933,
3278
- "step": 4670
3279
- },
3280
- {
3281
- "epoch": 2.0157209001830516,
3282
- "grad_norm": 0.5835046172142029,
3283
- "learning_rate": 5.325614017570712e-06,
3284
- "loss": 0.3897,
3285
- "step": 4680
3286
- },
3287
- {
3288
- "epoch": 2.0200279961236136,
3289
- "grad_norm": 0.5047739744186401,
3290
- "learning_rate": 5.283676459659288e-06,
3291
- "loss": 0.3992,
3292
- "step": 4690
3293
- },
3294
- {
3295
- "epoch": 2.0243350920641756,
3296
- "grad_norm": 0.5422953963279724,
3297
- "learning_rate": 5.241845309712921e-06,
3298
- "loss": 0.4131,
3299
- "step": 4700
3300
- },
3301
- {
3302
- "epoch": 2.0286421880047376,
3303
- "grad_norm": 0.5471384525299072,
3304
- "learning_rate": 5.2001215115106814e-06,
3305
- "loss": 0.3955,
3306
- "step": 4710
3307
- },
3308
- {
3309
- "epoch": 2.0329492839453,
3310
- "grad_norm": 0.5800908803939819,
3311
- "learning_rate": 5.158506006409644e-06,
3312
- "loss": 0.397,
3313
- "step": 4720
3314
- },
3315
- {
3316
- "epoch": 2.037256379885862,
3317
- "grad_norm": 0.5329377055168152,
3318
- "learning_rate": 5.116999733323591e-06,
3319
- "loss": 0.4017,
3320
- "step": 4730
3321
- },
3322
- {
3323
- "epoch": 2.041563475826424,
3324
- "grad_norm": 0.556845486164093,
3325
- "learning_rate": 5.075603628701869e-06,
3326
- "loss": 0.4009,
3327
- "step": 4740
3328
- },
3329
- {
3330
- "epoch": 2.045870571766986,
3331
- "grad_norm": 0.5501790642738342,
3332
- "learning_rate": 5.034318626508223e-06,
3333
- "loss": 0.3969,
3334
- "step": 4750
3335
- },
3336
- {
3337
- "epoch": 2.050177667707548,
3338
- "grad_norm": 0.5467825531959534,
3339
- "learning_rate": 4.993145658199766e-06,
3340
- "loss": 0.3996,
3341
- "step": 4760
3342
- },
3343
- {
3344
- "epoch": 2.05448476364811,
3345
- "grad_norm": 0.5644121766090393,
3346
- "learning_rate": 4.952085652705938e-06,
3347
- "loss": 0.3926,
3348
- "step": 4770
3349
- },
3350
- {
3351
- "epoch": 2.0587918595886725,
3352
- "grad_norm": 0.5279033780097961,
3353
- "learning_rate": 4.911139536407542e-06,
3354
- "loss": 0.3742,
3355
- "step": 4780
3356
- },
3357
- {
3358
- "epoch": 2.0630989555292345,
3359
- "grad_norm": 0.5283676981925964,
3360
- "learning_rate": 4.870308233115876e-06,
3361
- "loss": 0.3893,
3362
- "step": 4790
3363
- },
3364
- {
3365
- "epoch": 2.0674060514697965,
3366
- "grad_norm": 0.5302291512489319,
3367
- "learning_rate": 4.82959266405184e-06,
3368
- "loss": 0.3956,
3369
- "step": 4800
3370
- },
3371
- {
3372
- "epoch": 2.0717131474103585,
3373
- "grad_norm": 0.5381713509559631,
3374
- "learning_rate": 4.788993747825209e-06,
3375
- "loss": 0.4124,
3376
- "step": 4810
3377
- },
3378
- {
3379
- "epoch": 2.0760202433509205,
3380
- "grad_norm": 0.5772622227668762,
3381
- "learning_rate": 4.748512400413861e-06,
3382
- "loss": 0.405,
3383
- "step": 4820
3384
- },
3385
- {
3386
- "epoch": 2.0803273392914825,
3387
- "grad_norm": 0.5383191704750061,
3388
- "learning_rate": 4.708149535143138e-06,
3389
- "loss": 0.3874,
3390
- "step": 4830
3391
- },
3392
- {
3393
- "epoch": 2.084634435232045,
3394
- "grad_norm": 0.5546970963478088,
3395
- "learning_rate": 4.667906062665234e-06,
3396
- "loss": 0.3994,
3397
- "step": 4840
3398
- },
3399
- {
3400
- "epoch": 2.088941531172607,
3401
- "grad_norm": 0.5541481375694275,
3402
- "learning_rate": 4.627782890938632e-06,
3403
- "loss": 0.4073,
3404
- "step": 4850
3405
- },
3406
- {
3407
- "epoch": 2.093248627113169,
3408
- "grad_norm": 0.5656886100769043,
3409
- "learning_rate": 4.587780925207654e-06,
3410
- "loss": 0.3986,
3411
- "step": 4860
3412
- },
3413
- {
3414
- "epoch": 2.097555723053731,
3415
- "grad_norm": 0.5167860984802246,
3416
- "learning_rate": 4.5479010679819965e-06,
3417
- "loss": 0.3994,
3418
- "step": 4870
3419
- },
3420
- {
3421
- "epoch": 2.101862818994293,
3422
- "grad_norm": 0.585415780544281,
3423
- "learning_rate": 4.50814421901641e-06,
3424
- "loss": 0.3959,
3425
- "step": 4880
3426
- },
3427
- {
3428
- "epoch": 2.1061699149348554,
3429
- "grad_norm": 0.5390037894248962,
3430
- "learning_rate": 4.46851127529035e-06,
3431
- "loss": 0.393,
3432
- "step": 4890
3433
- },
3434
- {
3435
- "epoch": 2.1104770108754174,
3436
- "grad_norm": 0.5685362815856934,
3437
- "learning_rate": 4.42900313098779e-06,
3438
- "loss": 0.4031,
3439
- "step": 4900
3440
- },
3441
- {
3442
- "epoch": 2.1147841068159794,
3443
- "grad_norm": 0.5294394493103027,
3444
- "learning_rate": 4.389620677477023e-06,
3445
- "loss": 0.3926,
3446
- "step": 4910
3447
- },
3448
- {
3449
- "epoch": 2.1190912027565414,
3450
- "grad_norm": 0.5693227648735046,
3451
- "learning_rate": 4.3503648032905384e-06,
3452
- "loss": 0.3909,
3453
- "step": 4920
3454
- },
3455
- {
3456
- "epoch": 2.1233982986971034,
3457
- "grad_norm": 0.6294069886207581,
3458
- "learning_rate": 4.311236394105006e-06,
3459
- "loss": 0.3908,
3460
- "step": 4930
3461
- },
3462
- {
3463
- "epoch": 2.1277053946376654,
3464
- "grad_norm": 0.566862165927887,
3465
- "learning_rate": 4.27223633272126e-06,
3466
- "loss": 0.4019,
3467
- "step": 4940
3468
- },
3469
- {
3470
- "epoch": 2.132012490578228,
3471
- "grad_norm": 0.5680539608001709,
3472
- "learning_rate": 4.233365499044416e-06,
3473
- "loss": 0.3957,
3474
- "step": 4950
3475
- },
3476
- {
3477
- "epoch": 2.13631958651879,
3478
- "grad_norm": 0.5697780251502991,
3479
- "learning_rate": 4.194624770063985e-06,
3480
- "loss": 0.3876,
3481
- "step": 4960
3482
- },
3483
- {
3484
- "epoch": 2.140626682459352,
3485
- "grad_norm": 0.5857852697372437,
3486
- "learning_rate": 4.1560150198341174e-06,
3487
- "loss": 0.3986,
3488
- "step": 4970
3489
- },
3490
- {
3491
- "epoch": 2.144933778399914,
3492
- "grad_norm": 0.5707722306251526,
3493
- "learning_rate": 4.11753711945386e-06,
3494
- "loss": 0.4165,
3495
- "step": 4980
3496
- },
3497
- {
3498
- "epoch": 2.149240874340476,
3499
- "grad_norm": 0.5498836040496826,
3500
- "learning_rate": 4.079191937047511e-06,
3501
- "loss": 0.4236,
3502
- "step": 4990
3503
- },
3504
- {
3505
- "epoch": 2.153547970281038,
3506
- "grad_norm": 0.6008414626121521,
3507
- "learning_rate": 4.040980337745044e-06,
3508
- "loss": 0.3955,
3509
- "step": 5000
3510
- },
3511
- {
3512
- "epoch": 2.1578550662216003,
3513
- "grad_norm": 0.5871570110321045,
3514
- "learning_rate": 4.002903183662566e-06,
3515
- "loss": 0.3939,
3516
- "step": 5010
3517
- },
3518
- {
3519
- "epoch": 2.1621621621621623,
3520
- "grad_norm": 0.5556260347366333,
3521
- "learning_rate": 3.964961333882893e-06,
3522
- "loss": 0.4005,
3523
- "step": 5020
3524
- },
3525
- {
3526
- "epoch": 2.1664692581027243,
3527
- "grad_norm": 0.5592585206031799,
3528
- "learning_rate": 3.927155644436144e-06,
3529
- "loss": 0.4035,
3530
- "step": 5030
3531
- },
3532
- {
3533
- "epoch": 2.1707763540432863,
3534
- "grad_norm": 0.5638931393623352,
3535
- "learning_rate": 3.889486968280448e-06,
3536
- "loss": 0.3961,
3537
- "step": 5040
3538
- },
3539
- {
3540
- "epoch": 2.1750834499838483,
3541
- "grad_norm": 0.5473156571388245,
3542
- "learning_rate": 3.851956155282682e-06,
3543
- "loss": 0.3999,
3544
- "step": 5050
3545
- },
3546
- {
3547
- "epoch": 2.1793905459244103,
3548
- "grad_norm": 0.7088154554367065,
3549
- "learning_rate": 3.814564052199313e-06,
3550
- "loss": 0.3919,
3551
- "step": 5060
3552
- },
3553
- {
3554
- "epoch": 2.1836976418649727,
3555
- "grad_norm": 0.569315493106842,
3556
- "learning_rate": 3.777311502657279e-06,
3557
- "loss": 0.3924,
3558
- "step": 5070
3559
- },
3560
- {
3561
- "epoch": 2.1880047378055347,
3562
- "grad_norm": 0.6128218770027161,
3563
- "learning_rate": 3.7401993471349616e-06,
3564
- "loss": 0.4094,
3565
- "step": 5080
3566
- },
3567
- {
3568
- "epoch": 2.1923118337460967,
3569
- "grad_norm": 0.5971004962921143,
3570
- "learning_rate": 3.7032284229432325e-06,
3571
- "loss": 0.3786,
3572
- "step": 5090
3573
- },
3574
- {
3575
- "epoch": 2.1966189296866587,
3576
- "grad_norm": 0.5701526999473572,
3577
- "learning_rate": 3.666399564206541e-06,
3578
- "loss": 0.3912,
3579
- "step": 5100
3580
- },
3581
- {
3582
- "epoch": 2.2009260256272207,
3583
- "grad_norm": 0.5547009706497192,
3584
- "learning_rate": 3.6297136018441215e-06,
3585
- "loss": 0.3866,
3586
- "step": 5110
3587
- },
3588
- {
3589
- "epoch": 2.2052331215677827,
3590
- "grad_norm": 0.5613463521003723,
3591
- "learning_rate": 3.59317136355122e-06,
3592
- "loss": 0.3926,
3593
- "step": 5120
3594
- },
3595
- {
3596
- "epoch": 2.209540217508345,
3597
- "grad_norm": 0.6126610040664673,
3598
- "learning_rate": 3.556773673780446e-06,
3599
- "loss": 0.389,
3600
- "step": 5130
3601
- },
3602
- {
3603
- "epoch": 2.213847313448907,
3604
- "grad_norm": 0.5699272751808167,
3605
- "learning_rate": 3.520521353723142e-06,
3606
- "loss": 0.3982,
3607
- "step": 5140
3608
- },
3609
- {
3610
- "epoch": 2.218154409389469,
3611
- "grad_norm": 0.593333899974823,
3612
- "learning_rate": 3.484415221290889e-06,
3613
- "loss": 0.3826,
3614
- "step": 5150
3615
- },
3616
- {
3617
- "epoch": 2.222461505330031,
3618
- "grad_norm": 0.6188777685165405,
3619
- "learning_rate": 3.448456091097023e-06,
3620
- "loss": 0.4,
3621
- "step": 5160
3622
- },
3623
- {
3624
- "epoch": 2.226768601270593,
3625
- "grad_norm": 0.5949888825416565,
3626
- "learning_rate": 3.4126447744382753e-06,
3627
- "loss": 0.4062,
3628
- "step": 5170
3629
- },
3630
- {
3631
- "epoch": 2.231075697211155,
3632
- "grad_norm": 0.5788257718086243,
3633
- "learning_rate": 3.376982079276464e-06,
3634
- "loss": 0.3881,
3635
- "step": 5180
3636
- },
3637
- {
3638
- "epoch": 2.2353827931517176,
3639
- "grad_norm": 0.5726456642150879,
3640
- "learning_rate": 3.3414688102202564e-06,
3641
- "loss": 0.3968,
3642
- "step": 5190
3643
- },
3644
- {
3645
- "epoch": 2.2396898890922796,
3646
- "grad_norm": 0.5855600833892822,
3647
- "learning_rate": 3.3061057685070354e-06,
3648
- "loss": 0.3925,
3649
- "step": 5200
3650
- },
3651
- {
3652
- "epoch": 2.2439969850328416,
3653
- "grad_norm": 0.5823237299919128,
3654
- "learning_rate": 3.2708937519847916e-06,
3655
- "loss": 0.3875,
3656
- "step": 5210
3657
- },
3658
- {
3659
- "epoch": 2.2483040809734036,
3660
- "grad_norm": 0.5852989554405212,
3661
- "learning_rate": 3.23583355509416e-06,
3662
- "loss": 0.3985,
3663
- "step": 5220
3664
- },
3665
- {
3666
- "epoch": 2.2526111769139656,
3667
- "grad_norm": 0.5461825728416443,
3668
- "learning_rate": 3.200925968850459e-06,
3669
- "loss": 0.3917,
3670
- "step": 5230
3671
- },
3672
- {
3673
- "epoch": 2.256918272854528,
3674
- "grad_norm": 0.5536659359931946,
3675
- "learning_rate": 3.166171780825876e-06,
3676
- "loss": 0.3963,
3677
- "step": 5240
3678
- },
3679
- {
3680
- "epoch": 2.26122536879509,
3681
- "grad_norm": 0.5736192464828491,
3682
- "learning_rate": 3.1315717751316755e-06,
3683
- "loss": 0.4114,
3684
- "step": 5250
3685
- },
3686
- {
3687
- "epoch": 2.265532464735652,
3688
- "grad_norm": 0.5808764100074768,
3689
- "learning_rate": 3.097126732400515e-06,
3690
- "loss": 0.3795,
3691
- "step": 5260
3692
- },
3693
- {
3694
- "epoch": 2.269839560676214,
3695
- "grad_norm": 0.5790621042251587,
3696
- "learning_rate": 3.0628374297688436e-06,
3697
- "loss": 0.3991,
3698
- "step": 5270
3699
- },
3700
- {
3701
- "epoch": 2.274146656616776,
3702
- "grad_norm": 0.5211635231971741,
3703
- "learning_rate": 3.0287046408593478e-06,
3704
- "loss": 0.3796,
3705
- "step": 5280
3706
- },
3707
- {
3708
- "epoch": 2.278453752557338,
3709
- "grad_norm": 0.6152241230010986,
3710
- "learning_rate": 2.994729135763522e-06,
3711
- "loss": 0.3976,
3712
- "step": 5290
3713
- },
3714
- {
3715
- "epoch": 2.2827608484979,
3716
- "grad_norm": 0.6017261147499084,
3717
- "learning_rate": 2.9609116810242677e-06,
3718
- "loss": 0.4031,
3719
- "step": 5300
3720
- },
3721
- {
3722
- "epoch": 2.2870679444384625,
3723
- "grad_norm": 0.5612776279449463,
3724
- "learning_rate": 2.9272530396186194e-06,
3725
- "loss": 0.3985,
3726
- "step": 5310
3727
- },
3728
- {
3729
- "epoch": 2.2913750403790245,
3730
- "grad_norm": 0.6065710186958313,
3731
- "learning_rate": 2.893753970940525e-06,
3732
- "loss": 0.3975,
3733
- "step": 5320
3734
- },
3735
- {
3736
- "epoch": 2.2956821363195865,
3737
- "grad_norm": 0.5793972611427307,
3738
- "learning_rate": 2.8604152307837064e-06,
3739
- "loss": 0.3889,
3740
- "step": 5330
3741
- },
3742
- {
3743
- "epoch": 2.2999892322601485,
3744
- "grad_norm": 0.5591062307357788,
3745
- "learning_rate": 2.8272375713246125e-06,
3746
- "loss": 0.3903,
3747
- "step": 5340
3748
- },
3749
- {
3750
- "epoch": 2.3042963282007105,
3751
- "grad_norm": 0.5505937337875366,
3752
- "learning_rate": 2.794221741105446e-06,
3753
- "loss": 0.397,
3754
- "step": 5350
3755
- },
3756
- {
3757
- "epoch": 2.308603424141273,
3758
- "grad_norm": 0.6174246668815613,
3759
- "learning_rate": 2.7613684850172882e-06,
3760
- "loss": 0.3966,
3761
- "step": 5360
3762
- },
3763
- {
3764
- "epoch": 2.312910520081835,
3765
- "grad_norm": 0.6093124747276306,
3766
- "learning_rate": 2.7286785442832685e-06,
3767
- "loss": 0.3902,
3768
- "step": 5370
3769
- },
3770
- {
3771
- "epoch": 2.317217616022397,
3772
- "grad_norm": 0.5350244045257568,
3773
- "learning_rate": 2.696152656441868e-06,
3774
- "loss": 0.3935,
3775
- "step": 5380
3776
- },
3777
- {
3778
- "epoch": 2.321524711962959,
3779
- "grad_norm": 0.5422816276550293,
3780
- "learning_rate": 2.663791555330255e-06,
3781
- "loss": 0.3924,
3782
- "step": 5390
3783
- },
3784
- {
3785
- "epoch": 2.325831807903521,
3786
- "grad_norm": 0.5582048892974854,
3787
- "learning_rate": 2.6315959710677464e-06,
3788
- "loss": 0.397,
3789
- "step": 5400
3790
- },
3791
- {
3792
- "epoch": 2.330138903844083,
3793
- "grad_norm": 0.5601301789283752,
3794
- "learning_rate": 2.599566630039332e-06,
3795
- "loss": 0.3813,
3796
- "step": 5410
3797
- },
3798
- {
3799
- "epoch": 2.334445999784645,
3800
- "grad_norm": 0.5601345896720886,
3801
- "learning_rate": 2.567704254879274e-06,
3802
- "loss": 0.3974,
3803
- "step": 5420
3804
- },
3805
- {
3806
- "epoch": 2.3387530957252074,
3807
- "grad_norm": 0.614778459072113,
3808
- "learning_rate": 2.536009564454817e-06,
3809
- "loss": 0.3836,
3810
- "step": 5430
3811
- },
3812
- {
3813
- "epoch": 2.3430601916657694,
3814
- "grad_norm": 0.5759994983673096,
3815
- "learning_rate": 2.504483273849958e-06,
3816
- "loss": 0.3949,
3817
- "step": 5440
3818
- },
3819
- {
3820
- "epoch": 2.3473672876063314,
3821
- "grad_norm": 0.586625874042511,
3822
- "learning_rate": 2.473126094349331e-06,
3823
- "loss": 0.3829,
3824
- "step": 5450
3825
- },
3826
- {
3827
- "epoch": 2.3516743835468934,
3828
- "grad_norm": 0.5470960736274719,
3829
- "learning_rate": 2.4419387334221333e-06,
3830
- "loss": 0.3881,
3831
- "step": 5460
3832
- },
3833
- {
3834
- "epoch": 2.3559814794874554,
3835
- "grad_norm": 0.5486071705818176,
3836
- "learning_rate": 2.4109218947061884e-06,
3837
- "loss": 0.399,
3838
- "step": 5470
3839
- },
3840
- {
3841
- "epoch": 2.360288575428018,
3842
- "grad_norm": 0.5942230820655823,
3843
- "learning_rate": 2.3800762779920574e-06,
3844
- "loss": 0.3921,
3845
- "step": 5480
3846
- },
3847
- {
3848
- "epoch": 2.36459567136858,
3849
- "grad_norm": 0.5786502957344055,
3850
- "learning_rate": 2.3494025792072474e-06,
3851
- "loss": 0.3901,
3852
- "step": 5490
3853
- },
3854
- {
3855
- "epoch": 2.368902767309142,
3856
- "grad_norm": 0.6082814931869507,
3857
- "learning_rate": 2.3189014904005247e-06,
3858
- "loss": 0.391,
3859
- "step": 5500
3860
- },
3861
- {
3862
- "epoch": 2.373209863249704,
3863
- "grad_norm": 0.612694501876831,
3864
- "learning_rate": 2.2885736997262863e-06,
3865
- "loss": 0.3981,
3866
- "step": 5510
3867
- },
3868
- {
3869
- "epoch": 2.377516959190266,
3870
- "grad_norm": 0.5050374865531921,
3871
- "learning_rate": 2.2584198914290435e-06,
3872
- "loss": 0.3951,
3873
- "step": 5520
3874
- },
3875
- {
3876
- "epoch": 2.381824055130828,
3877
- "grad_norm": 0.5465214848518372,
3878
- "learning_rate": 2.2284407458279743e-06,
3879
- "loss": 0.4,
3880
- "step": 5530
3881
- },
3882
- {
3883
- "epoch": 2.3861311510713903,
3884
- "grad_norm": 0.5544529557228088,
3885
- "learning_rate": 2.1986369393015914e-06,
3886
- "loss": 0.3836,
3887
- "step": 5540
3888
- },
3889
- {
3890
- "epoch": 2.3904382470119523,
3891
- "grad_norm": 0.586337149143219,
3892
- "learning_rate": 2.169009144272467e-06,
3893
- "loss": 0.4139,
3894
- "step": 5550
3895
- },
3896
- {
3897
- "epoch": 2.3947453429525143,
3898
- "grad_norm": 0.6219981908798218,
3899
- "learning_rate": 2.1395580291920625e-06,
3900
- "loss": 0.4011,
3901
- "step": 5560
3902
- },
3903
- {
3904
- "epoch": 2.3990524388930763,
3905
- "grad_norm": 0.6941688060760498,
3906
- "learning_rate": 2.110284258525658e-06,
3907
- "loss": 0.405,
3908
- "step": 5570
3909
- },
3910
- {
3911
- "epoch": 2.4033595348336383,
3912
- "grad_norm": 0.5210332274436951,
3913
- "learning_rate": 2.081188492737345e-06,
3914
- "loss": 0.4017,
3915
- "step": 5580
3916
- },
3917
- {
3918
- "epoch": 2.4076666307742007,
3919
- "grad_norm": 0.5930879712104797,
3920
- "learning_rate": 2.0522713882751445e-06,
3921
- "loss": 0.3918,
3922
- "step": 5590
3923
- },
3924
- {
3925
- "epoch": 2.4119737267147627,
3926
- "grad_norm": 0.5910641551017761,
3927
- "learning_rate": 2.0235335975561775e-06,
3928
- "loss": 0.3996,
3929
- "step": 5600
3930
- },
3931
- {
3932
- "epoch": 2.4162808226553247,
3933
- "grad_norm": 0.5827698111534119,
3934
- "learning_rate": 1.9949757689519555e-06,
3935
- "loss": 0.3854,
3936
- "step": 5610
3937
- },
3938
- {
3939
- "epoch": 2.4205879185958867,
3940
- "grad_norm": 0.5518185496330261,
3941
- "learning_rate": 1.966598546773757e-06,
3942
- "loss": 0.4077,
3943
- "step": 5620
3944
- },
3945
- {
3946
- "epoch": 2.4248950145364487,
3947
- "grad_norm": 0.6005439162254333,
3948
- "learning_rate": 1.938402571258073e-06,
3949
- "loss": 0.4095,
3950
- "step": 5630
3951
- },
3952
- {
3953
- "epoch": 2.4292021104770107,
3954
- "grad_norm": 0.5761522054672241,
3955
- "learning_rate": 1.9103884785521887e-06,
3956
- "loss": 0.3966,
3957
- "step": 5640
3958
- },
3959
- {
3960
- "epoch": 2.4335092064175727,
3961
- "grad_norm": 0.5546764135360718,
3962
- "learning_rate": 1.8825569006998012e-06,
3963
- "loss": 0.395,
3964
- "step": 5650
3965
- },
3966
- {
3967
- "epoch": 2.437816302358135,
3968
- "grad_norm": 0.5639533996582031,
3969
- "learning_rate": 1.8549084656267846e-06,
3970
- "loss": 0.3938,
3971
- "step": 5660
3972
- },
3973
- {
3974
- "epoch": 2.442123398298697,
3975
- "grad_norm": 0.5662581324577332,
3976
- "learning_rate": 1.8274437971270044e-06,
3977
- "loss": 0.4004,
3978
- "step": 5670
3979
- },
3980
- {
3981
- "epoch": 2.446430494239259,
3982
- "grad_norm": 0.5856819748878479,
3983
- "learning_rate": 1.8001635148482621e-06,
3984
- "loss": 0.3946,
3985
- "step": 5680
3986
- },
3987
- {
3988
- "epoch": 2.450737590179821,
3989
- "grad_norm": 0.5766512751579285,
3990
- "learning_rate": 1.7730682342782967e-06,
3991
- "loss": 0.3931,
3992
- "step": 5690
3993
- },
3994
- {
3995
- "epoch": 2.455044686120383,
3996
- "grad_norm": 0.6373909711837769,
3997
- "learning_rate": 1.7461585667309045e-06,
3998
- "loss": 0.4006,
3999
- "step": 5700
4000
- },
4001
- {
4002
- "epoch": 2.4593517820609456,
4003
- "grad_norm": 0.5694748759269714,
4004
- "learning_rate": 1.719435119332159e-06,
4005
- "loss": 0.3989,
4006
- "step": 5710
4007
- },
4008
- {
4009
- "epoch": 2.4636588780015076,
4010
- "grad_norm": 0.5339934229850769,
4011
- "learning_rate": 1.6928984950066918e-06,
4012
- "loss": 0.3966,
4013
- "step": 5720
4014
- },
4015
- {
4016
- "epoch": 2.4679659739420696,
4017
- "grad_norm": 0.5888383388519287,
4018
- "learning_rate": 1.6665492924641113e-06,
4019
- "loss": 0.3833,
4020
- "step": 5730
4021
- },
4022
- {
4023
- "epoch": 2.4722730698826316,
4024
- "grad_norm": 0.5573282241821289,
4025
- "learning_rate": 1.6403881061854732e-06,
4026
- "loss": 0.4,
4027
- "step": 5740
4028
- },
4029
- {
4030
- "epoch": 2.4765801658231936,
4031
- "grad_norm": 0.5756634473800659,
4032
- "learning_rate": 1.6144155264098883e-06,
4033
- "loss": 0.3964,
4034
- "step": 5750
4035
- },
4036
- {
4037
- "epoch": 2.4808872617637556,
4038
- "grad_norm": 0.5784355401992798,
4039
- "learning_rate": 1.58863213912119e-06,
4040
- "loss": 0.3762,
4041
- "step": 5760
4042
- },
4043
- {
4044
- "epoch": 2.4851943577043176,
4045
- "grad_norm": 0.6090006828308105,
4046
- "learning_rate": 1.563038526034727e-06,
4047
- "loss": 0.3986,
4048
- "step": 5770
4049
- },
4050
- {
4051
- "epoch": 2.48950145364488,
4052
- "grad_norm": 0.5565779209136963,
4053
- "learning_rate": 1.5376352645842242e-06,
4054
- "loss": 0.3916,
4055
- "step": 5780
4056
- },
4057
- {
4058
- "epoch": 2.493808549585442,
4059
- "grad_norm": 0.6107103228569031,
4060
- "learning_rate": 1.5124229279087655e-06,
4061
- "loss": 0.4093,
4062
- "step": 5790
4063
- },
4064
- {
4065
- "epoch": 2.498115645526004,
4066
- "grad_norm": 0.5300205945968628,
4067
- "learning_rate": 1.487402084839864e-06,
4068
- "loss": 0.4047,
4069
- "step": 5800
4070
- },
4071
- {
4072
- "epoch": 2.502422741466566,
4073
- "grad_norm": 0.6008495688438416,
4074
- "learning_rate": 1.4625732998886178e-06,
4075
- "loss": 0.4023,
4076
- "step": 5810
4077
- },
4078
- {
4079
- "epoch": 2.5067298374071285,
4080
- "grad_norm": 0.5560673475265503,
4081
- "learning_rate": 1.437937133232985e-06,
4082
- "loss": 0.3968,
4083
- "step": 5820
4084
- },
4085
- {
4086
- "epoch": 2.5110369333476905,
4087
- "grad_norm": 0.5503118634223938,
4088
- "learning_rate": 1.413494140705136e-06,
4089
- "loss": 0.3876,
4090
- "step": 5830
4091
- },
4092
- {
4093
- "epoch": 2.5153440292882525,
4094
- "grad_norm": 0.5559957027435303,
4095
- "learning_rate": 1.3892448737789243e-06,
4096
- "loss": 0.392,
4097
- "step": 5840
4098
- },
4099
- {
4100
- "epoch": 2.5196511252288145,
4101
- "grad_norm": 0.5354902148246765,
4102
- "learning_rate": 1.365189879557426e-06,
4103
- "loss": 0.3988,
4104
- "step": 5850
4105
- },
4106
- {
4107
- "epoch": 2.5239582211693765,
4108
- "grad_norm": 0.577046275138855,
4109
- "learning_rate": 1.3413297007606196e-06,
4110
- "loss": 0.3948,
4111
- "step": 5860
4112
- },
4113
- {
4114
- "epoch": 2.5282653171099385,
4115
- "grad_norm": 0.5745800733566284,
4116
- "learning_rate": 1.3176648757131205e-06,
4117
- "loss": 0.395,
4118
- "step": 5870
4119
- },
4120
- {
4121
- "epoch": 2.5325724130505005,
4122
- "grad_norm": 0.5721185207366943,
4123
- "learning_rate": 1.2941959383320478e-06,
4124
- "loss": 0.3918,
4125
- "step": 5880
4126
- },
4127
- {
4128
- "epoch": 2.5368795089910625,
4129
- "grad_norm": 0.5935482978820801,
4130
- "learning_rate": 1.2709234181149765e-06,
4131
- "loss": 0.376,
4132
- "step": 5890
4133
- },
4134
- {
4135
- "epoch": 2.541186604931625,
4136
- "grad_norm": 0.5709375143051147,
4137
- "learning_rate": 1.2478478401279848e-06,
4138
- "loss": 0.3881,
4139
- "step": 5900
4140
- },
4141
- {
4142
- "epoch": 2.545493700872187,
4143
- "grad_norm": 0.5233684182167053,
4144
- "learning_rate": 1.2249697249938197e-06,
4145
- "loss": 0.3945,
4146
- "step": 5910
4147
- },
4148
- {
4149
- "epoch": 2.549800796812749,
4150
- "grad_norm": 0.5812388062477112,
4151
- "learning_rate": 1.2022895888801333e-06,
4152
- "loss": 0.3984,
4153
- "step": 5920
4154
- },
4155
- {
4156
- "epoch": 2.554107892753311,
4157
- "grad_norm": 0.560550332069397,
4158
- "learning_rate": 1.1798079434878584e-06,
4159
- "loss": 0.3942,
4160
- "step": 5930
4161
- },
4162
- {
4163
- "epoch": 2.5584149886938734,
4164
- "grad_norm": 0.6010858416557312,
4165
- "learning_rate": 1.1575252960396422e-06,
4166
- "loss": 0.3851,
4167
- "step": 5940
4168
- },
4169
- {
4170
- "epoch": 2.5627220846344354,
4171
- "grad_norm": 0.5857875347137451,
4172
- "learning_rate": 1.1354421492684252e-06,
4173
- "loss": 0.3993,
4174
- "step": 5950
4175
- },
4176
- {
4177
- "epoch": 2.5670291805749974,
4178
- "grad_norm": 0.604179859161377,
4179
- "learning_rate": 1.1135590014060772e-06,
4180
- "loss": 0.388,
4181
- "step": 5960
4182
- },
4183
- {
4184
- "epoch": 2.5713362765155594,
4185
- "grad_norm": 0.569106936454773,
4186
- "learning_rate": 1.0918763461721648e-06,
4187
- "loss": 0.4014,
4188
- "step": 5970
4189
- },
4190
- {
4191
- "epoch": 2.5756433724561214,
4192
- "grad_norm": 0.5742547512054443,
4193
- "learning_rate": 1.0703946727628234e-06,
4194
- "loss": 0.3839,
4195
- "step": 5980
4196
- },
4197
- {
4198
- "epoch": 2.5799504683966834,
4199
- "grad_norm": 0.5561407208442688,
4200
- "learning_rate": 1.0491144658397e-06,
4201
- "loss": 0.3853,
4202
- "step": 5990
4203
- },
4204
- {
4205
- "epoch": 2.5842575643372454,
4206
- "grad_norm": 0.5482295155525208,
4207
- "learning_rate": 1.0280362055190341e-06,
4208
- "loss": 0.3876,
4209
- "step": 6000
4210
- },
4211
- {
4212
- "epoch": 2.588564660277808,
4213
- "grad_norm": 0.5737982392311096,
4214
- "learning_rate": 1.0071603673608176e-06,
4215
- "loss": 0.4059,
4216
- "step": 6010
4217
- },
4218
- {
4219
- "epoch": 2.59287175621837,
4220
- "grad_norm": 0.547715961933136,
4221
- "learning_rate": 9.864874223580668e-07,
4222
- "loss": 0.3837,
4223
- "step": 6020
4224
- },
4225
- {
4226
- "epoch": 2.597178852158932,
4227
- "grad_norm": 0.607851505279541,
4228
- "learning_rate": 9.66017836926203e-07,
4229
- "loss": 0.3779,
4230
- "step": 6030
4231
- },
4232
- {
4233
- "epoch": 2.601485948099494,
4234
- "grad_norm": 0.5557613968849182,
4235
- "learning_rate": 9.457520728925151e-07,
4236
- "loss": 0.3995,
4237
- "step": 6040
4238
- },
4239
- {
4240
- "epoch": 2.605793044040056,
4241
- "grad_norm": 0.5470052361488342,
4242
- "learning_rate": 9.256905874857535e-07,
4243
- "loss": 0.3916,
4244
- "step": 6050
4245
- },
4246
- {
4247
- "epoch": 2.6101001399806183,
4248
- "grad_norm": 0.5718830227851868,
4249
- "learning_rate": 9.058338333258032e-07,
4250
- "loss": 0.3997,
4251
- "step": 6060
4252
- },
4253
- {
4254
- "epoch": 2.6144072359211803,
4255
- "grad_norm": 0.5838637948036194,
4256
- "learning_rate": 8.861822584134882e-07,
4257
- "loss": 0.39,
4258
- "step": 6070
4259
- },
4260
- {
4261
- "epoch": 2.6187143318617423,
4262
- "grad_norm": 0.5819488763809204,
4263
- "learning_rate": 8.667363061204415e-07,
4264
- "loss": 0.4028,
4265
- "step": 6080
4266
- },
4267
- {
4268
- "epoch": 2.6230214278023043,
4269
- "grad_norm": 0.5477743744850159,
4270
- "learning_rate": 8.474964151791232e-07,
4271
- "loss": 0.3979,
4272
- "step": 6090
4273
- },
4274
- {
4275
- "epoch": 2.6273285237428663,
4276
- "grad_norm": 0.6217262744903564,
4277
- "learning_rate": 8.284630196729059e-07,
4278
- "loss": 0.3993,
4279
- "step": 6100
4280
- },
4281
- {
4282
- "epoch": 2.6316356196834283,
4283
- "grad_norm": 0.5514227747917175,
4284
- "learning_rate": 8.096365490262925e-07,
4285
- "loss": 0.4058,
4286
- "step": 6110
4287
- },
4288
- {
4289
- "epoch": 2.6359427156239903,
4290
- "grad_norm": 0.645946204662323,
4291
- "learning_rate": 7.910174279952232e-07,
4292
- "loss": 0.3992,
4293
- "step": 6120
4294
- },
4295
- {
4296
- "epoch": 2.6402498115645527,
4297
- "grad_norm": 0.5741420984268188,
4298
- "learning_rate": 7.726060766574883e-07,
4299
- "loss": 0.3938,
4300
- "step": 6130
4301
- },
4302
- {
4303
- "epoch": 2.6445569075051147,
4304
- "grad_norm": 0.5910946726799011,
4305
- "learning_rate": 7.544029104032558e-07,
4306
- "loss": 0.3898,
4307
- "step": 6140
4308
- },
4309
- {
4310
- "epoch": 2.6488640034456767,
4311
- "grad_norm": 0.5803595185279846,
4312
- "learning_rate": 7.364083399256971e-07,
4313
- "loss": 0.388,
4314
- "step": 6150
4315
- },
4316
- {
4317
- "epoch": 2.6531710993862387,
4318
- "grad_norm": 0.596809446811676,
4319
- "learning_rate": 7.186227712117266e-07,
4320
- "loss": 0.388,
4321
- "step": 6160
4322
- },
4323
- {
4324
- "epoch": 2.6574781953268007,
4325
- "grad_norm": 0.6213387250900269,
4326
- "learning_rate": 7.010466055328313e-07,
4327
- "loss": 0.3839,
4328
- "step": 6170
4329
- },
4330
- {
4331
- "epoch": 2.661785291267363,
4332
- "grad_norm": 0.5913180112838745,
4333
- "learning_rate": 6.836802394360276e-07,
4334
- "loss": 0.3989,
4335
- "step": 6180
4336
- },
4337
- {
4338
- "epoch": 2.666092387207925,
4339
- "grad_norm": 0.6089721322059631,
4340
- "learning_rate": 6.665240647349125e-07,
4341
- "loss": 0.4039,
4342
- "step": 6190
4343
- },
4344
- {
4345
- "epoch": 2.670399483148487,
4346
- "grad_norm": 0.5730729103088379,
4347
- "learning_rate": 6.495784685008133e-07,
4348
- "loss": 0.3951,
4349
- "step": 6200
4350
- },
4351
- {
4352
- "epoch": 2.674706579089049,
4353
- "grad_norm": 0.5562758445739746,
4354
- "learning_rate": 6.32843833054072e-07,
4355
- "loss": 0.3837,
4356
- "step": 6210
4357
- },
4358
- {
4359
- "epoch": 2.679013675029611,
4360
- "grad_norm": 0.5627213716506958,
4361
- "learning_rate": 6.16320535955407e-07,
4362
- "loss": 0.3712,
4363
- "step": 6220
4364
- },
4365
- {
4366
- "epoch": 2.683320770970173,
4367
- "grad_norm": 0.559660017490387,
4368
- "learning_rate": 6.000089499973971e-07,
4369
- "loss": 0.3901,
4370
- "step": 6230
4371
- },
4372
- {
4373
- "epoch": 2.687627866910735,
4374
- "grad_norm": 0.6018761992454529,
4375
- "learning_rate": 5.839094431960713e-07,
4376
- "loss": 0.383,
4377
- "step": 6240
4378
- },
4379
- {
4380
- "epoch": 2.6919349628512976,
4381
- "grad_norm": 0.5534284710884094,
4382
- "learning_rate": 5.680223787826089e-07,
4383
- "loss": 0.3925,
4384
- "step": 6250
4385
- },
4386
- {
4387
- "epoch": 2.6962420587918596,
4388
- "grad_norm": 0.5682888031005859,
4389
- "learning_rate": 5.523481151951427e-07,
4390
- "loss": 0.3929,
4391
- "step": 6260
4392
- },
4393
- {
4394
- "epoch": 2.7005491547324216,
4395
- "grad_norm": 0.6271238923072815,
4396
- "learning_rate": 5.368870060706677e-07,
4397
- "loss": 0.3942,
4398
- "step": 6270
4399
- },
4400
- {
4401
- "epoch": 2.7048562506729836,
4402
- "grad_norm": 0.5881267786026001,
4403
- "learning_rate": 5.216394002370695e-07,
4404
- "loss": 0.3876,
4405
- "step": 6280
4406
- },
4407
- {
4408
- "epoch": 2.709163346613546,
4409
- "grad_norm": 0.6085900068283081,
4410
- "learning_rate": 5.066056417052445e-07,
4411
- "loss": 0.3958,
4412
- "step": 6290
4413
- },
4414
- {
4415
- "epoch": 2.713470442554108,
4416
- "grad_norm": 0.5912172198295593,
4417
- "learning_rate": 4.917860696613541e-07,
4418
- "loss": 0.3887,
4419
- "step": 6300
4420
- },
4421
- {
4422
- "epoch": 2.71777753849467,
4423
- "grad_norm": 0.6698789596557617,
4424
- "learning_rate": 4.771810184591541e-07,
4425
- "loss": 0.3899,
4426
- "step": 6310
4427
- },
4428
- {
4429
- "epoch": 2.722084634435232,
4430
- "grad_norm": 0.5682712197303772,
4431
- "learning_rate": 4.627908176124618e-07,
4432
- "loss": 0.3826,
4433
- "step": 6320
4434
- },
4435
- {
4436
- "epoch": 2.726391730375794,
4437
- "grad_norm": 0.5702280402183533,
4438
- "learning_rate": 4.486157917877232e-07,
4439
- "loss": 0.3908,
4440
- "step": 6330
4441
- },
4442
- {
4443
- "epoch": 2.730698826316356,
4444
- "grad_norm": 0.5540564060211182,
4445
- "learning_rate": 4.346562607966787e-07,
4446
- "loss": 0.3962,
4447
- "step": 6340
4448
- },
4449
- {
4450
- "epoch": 2.735005922256918,
4451
- "grad_norm": 0.6031074523925781,
4452
- "learning_rate": 4.209125395891589e-07,
4453
- "loss": 0.3791,
4454
- "step": 6350
4455
- },
4456
- {
4457
- "epoch": 2.73931301819748,
4458
- "grad_norm": 0.5727553963661194,
4459
- "learning_rate": 4.0738493824596715e-07,
4460
- "loss": 0.4023,
4461
- "step": 6360
4462
- },
4463
- {
4464
- "epoch": 2.7436201141380425,
4465
- "grad_norm": 0.5374717116355896,
4466
- "learning_rate": 3.940737619718937e-07,
4467
- "loss": 0.38,
4468
- "step": 6370
4469
- },
4470
- {
4471
- "epoch": 2.7479272100786045,
4472
- "grad_norm": 0.5720168352127075,
4473
- "learning_rate": 3.809793110888249e-07,
4474
- "loss": 0.4011,
4475
- "step": 6380
4476
- },
4477
- {
4478
- "epoch": 2.7522343060191665,
4479
- "grad_norm": 0.5751203894615173,
4480
- "learning_rate": 3.6810188102896605e-07,
4481
- "loss": 0.3941,
4482
- "step": 6390
4483
- },
4484
- {
4485
- "epoch": 2.7565414019597285,
4486
- "grad_norm": 0.5838513970375061,
4487
- "learning_rate": 3.554417623281825e-07,
4488
- "loss": 0.3834,
4489
- "step": 6400
4490
- },
4491
- {
4492
- "epoch": 2.760848497900291,
4493
- "grad_norm": 0.6204310059547424,
4494
- "learning_rate": 3.429992406194338e-07,
4495
- "loss": 0.3933,
4496
- "step": 6410
4497
- },
4498
- {
4499
- "epoch": 2.765155593840853,
4500
- "grad_norm": 0.6237754225730896,
4501
- "learning_rate": 3.3077459662634205e-07,
4502
- "loss": 0.3911,
4503
- "step": 6420
4504
- },
4505
- {
4506
- "epoch": 2.769462689781415,
4507
- "grad_norm": 0.561553418636322,
4508
- "learning_rate": 3.1876810615684705e-07,
4509
- "loss": 0.3847,
4510
- "step": 6430
4511
- },
4512
- {
4513
- "epoch": 2.773769785721977,
4514
- "grad_norm": 0.568580150604248,
4515
- "learning_rate": 3.069800400969947e-07,
4516
- "loss": 0.3967,
4517
- "step": 6440
4518
- },
4519
- {
4520
- "epoch": 2.778076881662539,
4521
- "grad_norm": 0.6103531122207642,
4522
- "learning_rate": 2.954106644048127e-07,
4523
- "loss": 0.3731,
4524
- "step": 6450
4525
- },
4526
- {
4527
- "epoch": 2.782383977603101,
4528
- "grad_norm": 0.560199499130249,
4529
- "learning_rate": 2.840602401043213e-07,
4530
- "loss": 0.3889,
4531
- "step": 6460
4532
- },
4533
- {
4534
- "epoch": 2.786691073543663,
4535
- "grad_norm": 0.5612174868583679,
4536
- "learning_rate": 2.7292902327963776e-07,
4537
- "loss": 0.3915,
4538
- "step": 6470
4539
- },
4540
- {
4541
- "epoch": 2.7909981694842254,
4542
- "grad_norm": 0.5860500335693359,
4543
- "learning_rate": 2.620172650692021e-07,
4544
- "loss": 0.4063,
4545
- "step": 6480
4546
- },
4547
- {
4548
- "epoch": 2.7953052654247874,
4549
- "grad_norm": 0.6044652462005615,
4550
- "learning_rate": 2.513252116601062e-07,
4551
- "loss": 0.39,
4552
- "step": 6490
4553
- },
4554
- {
4555
- "epoch": 2.7996123613653494,
4556
- "grad_norm": 0.5966377258300781,
4557
- "learning_rate": 2.408531042825446e-07,
4558
- "loss": 0.3965,
4559
- "step": 6500
4560
- },
4561
- {
4562
- "epoch": 2.8039194573059114,
4563
- "grad_norm": 0.5729289650917053,
4564
- "learning_rate": 2.3060117920437164e-07,
4565
- "loss": 0.3798,
4566
- "step": 6510
4567
- },
4568
- {
4569
- "epoch": 2.8082265532464734,
4570
- "grad_norm": 0.6403810977935791,
4571
- "learning_rate": 2.2056966772576626e-07,
4572
- "loss": 0.4096,
4573
- "step": 6520
4574
- },
4575
- {
4576
- "epoch": 2.812533649187036,
4577
- "grad_norm": 0.5852852463722229,
4578
- "learning_rate": 2.1075879617401984e-07,
4579
- "loss": 0.383,
4580
- "step": 6530
4581
- },
4582
- {
4583
- "epoch": 2.816840745127598,
4584
- "grad_norm": 0.6858223080635071,
4585
- "learning_rate": 2.0116878589842236e-07,
4586
- "loss": 0.3763,
4587
- "step": 6540
4588
- },
4589
- {
4590
- "epoch": 2.82114784106816,
4591
- "grad_norm": 0.5583459138870239,
4592
- "learning_rate": 1.917998532652765e-07,
4593
- "loss": 0.4007,
4594
- "step": 6550
4595
- },
4596
- {
4597
- "epoch": 2.825454937008722,
4598
- "grad_norm": 0.6212313175201416,
4599
- "learning_rate": 1.8265220965300812e-07,
4600
- "loss": 0.3946,
4601
- "step": 6560
4602
- },
4603
- {
4604
- "epoch": 2.829762032949284,
4605
- "grad_norm": 0.5777102112770081,
4606
- "learning_rate": 1.7372606144740567e-07,
4607
- "loss": 0.3908,
4608
- "step": 6570
4609
- },
4610
- {
4611
- "epoch": 2.834069128889846,
4612
- "grad_norm": 0.5885289311408997,
4613
- "learning_rate": 1.6502161003695615e-07,
4614
- "loss": 0.4051,
4615
- "step": 6580
4616
- },
4617
- {
4618
- "epoch": 2.838376224830408,
4619
- "grad_norm": 0.6133362054824829,
4620
- "learning_rate": 1.5653905180830432e-07,
4621
- "loss": 0.3909,
4622
- "step": 6590
4623
- },
4624
- {
4625
- "epoch": 2.8426833207709703,
4626
- "grad_norm": 0.5662548542022705,
4627
- "learning_rate": 1.48278578141825e-07,
4628
- "loss": 0.3689,
4629
- "step": 6600
4630
- },
4631
- {
4632
- "epoch": 2.8469904167115323,
4633
- "grad_norm": 0.5703479647636414,
4634
- "learning_rate": 1.4024037540730006e-07,
4635
- "loss": 0.3812,
4636
- "step": 6610
4637
- },
4638
- {
4639
- "epoch": 2.8512975126520943,
4640
- "grad_norm": 0.5604844689369202,
4641
- "learning_rate": 1.324246249597183e-07,
4642
- "loss": 0.3992,
4643
- "step": 6620
4644
- },
4645
- {
4646
- "epoch": 2.8556046085926563,
4647
- "grad_norm": 0.6033147573471069,
4648
- "learning_rate": 1.2483150313517766e-07,
4649
- "loss": 0.3937,
4650
- "step": 6630
4651
- },
4652
- {
4653
- "epoch": 2.8599117045332187,
4654
- "grad_norm": 0.5846080780029297,
4655
- "learning_rate": 1.1746118124691508e-07,
4656
- "loss": 0.4123,
4657
- "step": 6640
4658
- },
4659
- {
4660
- "epoch": 2.8642188004737807,
4661
- "grad_norm": 0.63025963306427,
4662
- "learning_rate": 1.103138255814329e-07,
4663
- "loss": 0.3998,
4664
- "step": 6650
4665
- },
4666
- {
4667
- "epoch": 2.8685258964143427,
4668
- "grad_norm": 0.5580465197563171,
4669
- "learning_rate": 1.0338959739475296e-07,
4670
- "loss": 0.4007,
4671
- "step": 6660
4672
- },
4673
- {
4674
- "epoch": 2.8728329923549047,
4675
- "grad_norm": 0.5767059326171875,
4676
- "learning_rate": 9.66886529087785e-08,
4677
- "loss": 0.4008,
4678
- "step": 6670
4679
- },
4680
- {
4681
- "epoch": 2.8771400882954667,
4682
- "grad_norm": 0.583044707775116,
4683
- "learning_rate": 9.021114330776348e-08,
4684
- "loss": 0.403,
4685
- "step": 6680
4686
- },
4687
- {
4688
- "epoch": 2.8814471842360287,
4689
- "grad_norm": 0.5440847873687744,
4690
- "learning_rate": 8.395721473490992e-08,
4691
- "loss": 0.3839,
4692
- "step": 6690
4693
- },
4694
- {
4695
- "epoch": 2.8857542801765907,
4696
- "grad_norm": 0.55162513256073,
4697
- "learning_rate": 7.792700828906374e-08,
4698
- "loss": 0.4017,
4699
- "step": 6700
4700
- },
4701
- {
4702
- "epoch": 2.8900613761171527,
4703
- "grad_norm": 0.5817933082580566,
4704
- "learning_rate": 7.212066002153518e-08,
4705
- "loss": 0.4009,
4706
- "step": 6710
4707
- },
4708
- {
4709
- "epoch": 2.894368472057715,
4710
- "grad_norm": 0.6080750226974487,
4711
- "learning_rate": 6.653830093302782e-08,
4712
- "loss": 0.3964,
4713
- "step": 6720
4714
- },
4715
- {
4716
- "epoch": 2.898675567998277,
4717
- "grad_norm": 0.5681482553482056,
4718
- "learning_rate": 6.11800569706833e-08,
4719
- "loss": 0.4003,
4720
- "step": 6730
4721
- },
4722
- {
4723
- "epoch": 2.902982663938839,
4724
- "grad_norm": 0.5769705176353455,
4725
- "learning_rate": 5.604604902524235e-08,
4726
- "loss": 0.4017,
4727
- "step": 6740
4728
- },
4729
- {
4730
- "epoch": 2.907289759879401,
4731
- "grad_norm": 0.546116828918457,
4732
- "learning_rate": 5.113639292831152e-08,
4733
- "loss": 0.3828,
4734
- "step": 6750
4735
- },
4736
- {
4737
- "epoch": 2.9115968558199636,
4738
- "grad_norm": 0.590798020362854,
4739
- "learning_rate": 4.645119944975296e-08,
4740
- "loss": 0.3853,
4741
- "step": 6760
4742
- },
4743
- {
4744
- "epoch": 2.9159039517605256,
4745
- "grad_norm": 0.5748469233512878,
4746
- "learning_rate": 4.1990574295187606e-08,
4747
- "loss": 0.4107,
4748
- "step": 6770
4749
- },
4750
- {
4751
- "epoch": 2.9202110477010876,
4752
- "grad_norm": 0.5733410716056824,
4753
- "learning_rate": 3.7754618103608144e-08,
4754
- "loss": 0.4052,
4755
- "step": 6780
4756
- },
4757
- {
4758
- "epoch": 2.9245181436416496,
4759
- "grad_norm": 0.5576743483543396,
4760
- "learning_rate": 3.374342644510531e-08,
4761
- "loss": 0.3846,
4762
- "step": 6790
4763
- },
4764
- {
4765
- "epoch": 2.9288252395822116,
4766
- "grad_norm": 0.596834123134613,
4767
- "learning_rate": 2.9957089818718476e-08,
4768
- "loss": 0.4029,
4769
- "step": 6800
4770
- },
4771
- {
4772
- "epoch": 2.9331323355227736,
4773
- "grad_norm": 0.5680873990058899,
4774
- "learning_rate": 2.639569365038841e-08,
4775
- "loss": 0.381,
4776
- "step": 6810
4777
- },
4778
- {
4779
- "epoch": 2.9374394314633356,
4780
- "grad_norm": 0.5597060918807983,
4781
- "learning_rate": 2.305931829102992e-08,
4782
- "loss": 0.3974,
4783
- "step": 6820
4784
- },
4785
- {
4786
- "epoch": 2.941746527403898,
4787
- "grad_norm": 0.5827191472053528,
4788
- "learning_rate": 1.9948039014724417e-08,
4789
- "loss": 0.3973,
4790
- "step": 6830
4791
- },
4792
- {
4793
- "epoch": 2.94605362334446,
4794
- "grad_norm": 0.6119829416275024,
4795
- "learning_rate": 1.706192601701462e-08,
4796
- "loss": 0.3984,
4797
- "step": 6840
4798
- },
4799
- {
4800
- "epoch": 2.950360719285022,
4801
- "grad_norm": 0.602497935295105,
4802
- "learning_rate": 1.4401044413324682e-08,
4803
- "loss": 0.4086,
4804
- "step": 6850
4805
- },
4806
- {
4807
- "epoch": 2.954667815225584,
4808
- "grad_norm": 0.5783790349960327,
4809
- "learning_rate": 1.1965454237493623e-08,
4810
- "loss": 0.3945,
4811
- "step": 6860
4812
- },
4813
- {
4814
- "epoch": 2.958974911166146,
4815
- "grad_norm": 0.5653091073036194,
4816
- "learning_rate": 9.755210440413055e-09,
4817
- "loss": 0.3938,
4818
- "step": 6870
4819
- },
4820
- {
4821
- "epoch": 2.9632820071067085,
4822
- "grad_norm": 0.5716846585273743,
4823
- "learning_rate": 7.770362888795957e-09,
4824
- "loss": 0.3935,
4825
- "step": 6880
4826
- },
4827
- {
4828
- "epoch": 2.9675891030472705,
4829
- "grad_norm": 0.6015262603759766,
4830
- "learning_rate": 6.0109563640442515e-09,
4831
- "loss": 0.3955,
4832
- "step": 6890
4833
- },
4834
- {
4835
- "epoch": 2.9718961989878325,
4836
- "grad_norm": 0.5763514041900635,
4837
- "learning_rate": 4.477030561246265e-09,
4838
- "loss": 0.4069,
4839
- "step": 6900
4840
- },
4841
- {
4842
- "epoch": 2.9762032949283945,
4843
- "grad_norm": 0.5644577741622925,
4844
- "learning_rate": 3.168620088271901e-09,
4845
- "loss": 0.3921,
4846
- "step": 6910
4847
- },
4848
- {
4849
- "epoch": 2.9805103908689565,
4850
- "grad_norm": 0.5302848219871521,
4851
- "learning_rate": 2.0857544650010332e-09,
4852
- "loss": 0.404,
4853
- "step": 6920
4854
- },
4855
- {
4856
- "epoch": 2.9848174868095185,
4857
- "grad_norm": 0.6025976538658142,
4858
- "learning_rate": 1.2284581226507108e-09,
4859
- "loss": 0.4037,
4860
- "step": 6930
4861
- },
4862
- {
4863
- "epoch": 2.9891245827500805,
4864
- "grad_norm": 0.5681896805763245,
4865
- "learning_rate": 5.967504032267091e-10,
4866
- "loss": 0.4031,
4867
- "step": 6940
4868
- },
4869
- {
4870
- "epoch": 2.993431678690643,
4871
- "grad_norm": 0.5708478093147278,
4872
- "learning_rate": 1.906455590883205e-10,
4873
- "loss": 0.4206,
4874
- "step": 6950
4875
- },
4876
- {
4877
- "epoch": 2.997738774631205,
4878
- "grad_norm": 0.5966918468475342,
4879
- "learning_rate": 1.015275262306048e-11,
4880
- "loss": 0.4014,
4881
- "step": 6960
4882
  }
4883
  ],
4884
  "logging_steps": 10,
@@ -4893,12 +3171,12 @@
4893
  "should_evaluate": false,
4894
  "should_log": false,
4895
  "should_save": true,
4896
- "should_training_stop": true
4897
  },
4898
  "attributes": {}
4899
  }
4900
  },
4901
- "total_flos": 7523782707118080.0,
4902
  "train_batch_size": 16,
4903
  "trial_name": null,
4904
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1.9381931732529343,
5
  "eval_steps": 500,
6
+ "global_step": 4500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
3157
  "learning_rate": 6.097595982676103e-06,
3158
  "loss": 0.5065,
3159
  "step": 4500
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3160
  }
3161
  ],
3162
  "logging_steps": 10,
 
3171
  "should_evaluate": false,
3172
  "should_log": false,
3173
  "should_save": true,
3174
+ "should_training_stop": false
3175
  },
3176
  "attributes": {}
3177
  }
3178
  },
3179
+ "total_flos": 4861580908953600.0,
3180
  "train_batch_size": 16,
3181
  "trial_name": null,
3182
  "trial_params": null