File size: 51,714 Bytes
2e3f432
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
09/23/2023 12:10:45 - WARNING - __main__ -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False
09/23/2023 12:11:04 - INFO - __main__ -   Training/evaluation parameters Namespace(train_file='../../../data/mcqa/atomic/train_atm_n_2i_half_sample_name.jsonl', dev_file='../../../data/mcqa/atomic/dev_random_10k.jsonl', model_type='deberta-mlm', model_name_or_path='microsoft/deberta-v3-large', config_name='', tokenizer_name='', cache_dir='.cache', task_name='atomic', output_dir='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', second_train_file=None, second_dev_file=None, max_seq_length=128, max_words_to_mask=6, max_sequence_per_time=80, do_train=True, do_eval=True, do_ext_eval=True, evaluate_during_training=True, do_lower_case=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=16, gradient_accumulation_steps=16, margin=1.0, learning_rate=5e-06, weight_decay=0.01, adam_epsilon=1e-06, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, warmup_proportion=0.05, logging_steps=50, save_steps=500, logits_file='logits_test.txt', results_file='eval_results.txt', no_cuda=False, overwrite_output_dir=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, server_ip='', server_port='', eval_output_dir='./eval_results', n_gpu=1, device=device(type='cuda'))
09/23/2023 12:11:13 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 12:11:13 - INFO - __main__ -     Num examples = 10000
09/23/2023 12:11:13 - INFO - __main__ -     Batch size = 16
09/23/2023 12:15:11 - INFO - __main__ -   ***** Eval results *****
09/23/2023 12:15:11 - INFO - __main__ -     acc = 0.3392
09/23/2023 12:25:13 - INFO - __main__ -   warm up steps = 835
09/23/2023 12:25:13 - INFO - __main__ -   ***** Running training *****
09/23/2023 12:25:13 - INFO - __main__ -     Num examples = 534833
09/23/2023 12:25:13 - INFO - __main__ -     Num Epochs = 1
09/23/2023 12:25:13 - INFO - __main__ -     Instantaneous batch size per GPU = 2
09/23/2023 12:25:13 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 32
09/23/2023 12:25:13 - INFO - __main__ -     Gradient Accumulation steps = 16
09/23/2023 12:25:13 - INFO - __main__ -     Total optimization steps = 16713
09/23/2023 12:28:54 - INFO - __main__ -    global_step = 50, average loss = 0.6903331369534135
09/23/2023 12:32:33 - INFO - __main__ -    global_step = 100, average loss = 0.6819266405794769
09/23/2023 12:36:13 - INFO - __main__ -    global_step = 150, average loss = 0.6690767159638926
09/23/2023 12:39:56 - INFO - __main__ -    global_step = 200, average loss = 0.6476348407182377
09/23/2023 12:43:39 - INFO - __main__ -    global_step = 250, average loss = 0.6220815655076877
09/23/2023 12:47:19 - INFO - __main__ -    global_step = 300, average loss = 0.5299683179453859
09/23/2023 12:50:56 - INFO - __main__ -    global_step = 350, average loss = 0.39345016410181416
09/23/2023 12:54:38 - INFO - __main__ -    global_step = 400, average loss = 0.31127411118301096
09/23/2023 12:58:19 - INFO - __main__ -    global_step = 450, average loss = 0.25150225180907
09/23/2023 13:02:00 - INFO - __main__ -    global_step = 500, average loss = 0.22586858159028453
09/23/2023 13:02:01 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 13:02:01 - INFO - __main__ -     Num examples = 10000
09/23/2023 13:02:01 - INFO - __main__ -     Batch size = 16
09/23/2023 13:05:56 - INFO - __main__ -   ***** Eval results *****
09/23/2023 13:05:56 - INFO - __main__ -     acc = 0.6996
09/23/2023 13:06:23 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 13:10:02 - INFO - __main__ -    global_step = 550, average loss = 0.22251796642665794
09/23/2023 13:13:46 - INFO - __main__ -    global_step = 600, average loss = 0.19366045010890956
09/23/2023 13:17:29 - INFO - __main__ -    global_step = 650, average loss = 0.18587105088678071
09/23/2023 13:21:15 - INFO - __main__ -    global_step = 700, average loss = 0.1760789550206391
09/23/2023 13:24:59 - INFO - __main__ -    global_step = 750, average loss = 0.18312411408871412
09/23/2023 13:28:42 - INFO - __main__ -    global_step = 800, average loss = 0.15576540186157217
09/23/2023 13:32:25 - INFO - __main__ -    global_step = 850, average loss = 0.16302873345994157
09/23/2023 13:36:07 - INFO - __main__ -    global_step = 900, average loss = 0.15725697406036487
09/23/2023 13:39:46 - INFO - __main__ -    global_step = 950, average loss = 0.15640976145299645
09/23/2023 13:43:33 - INFO - __main__ -    global_step = 1000, average loss = 0.15606625928507128
09/23/2023 13:43:34 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 13:43:34 - INFO - __main__ -     Num examples = 10000
09/23/2023 13:43:34 - INFO - __main__ -     Batch size = 16
09/23/2023 13:47:30 - INFO - __main__ -   ***** Eval results *****
09/23/2023 13:47:30 - INFO - __main__ -     acc = 0.7961
09/23/2023 13:47:58 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 13:51:41 - INFO - __main__ -    global_step = 1050, average loss = 0.14431810150181262
09/23/2023 13:55:20 - INFO - __main__ -    global_step = 1100, average loss = 0.15233074207513708
09/23/2023 13:59:01 - INFO - __main__ -    global_step = 1150, average loss = 0.1404175848151772
09/23/2023 14:02:44 - INFO - __main__ -    global_step = 1200, average loss = 0.12134294869215864
09/23/2023 14:06:20 - INFO - __main__ -    global_step = 1250, average loss = 0.1363200130731275
09/23/2023 14:09:59 - INFO - __main__ -    global_step = 1300, average loss = 0.13769450530940958
09/23/2023 14:13:43 - INFO - __main__ -    global_step = 1350, average loss = 0.12156560226379952
09/23/2023 14:17:18 - INFO - __main__ -    global_step = 1400, average loss = 0.12623315585107775
09/23/2023 14:20:59 - INFO - __main__ -    global_step = 1450, average loss = 0.14377202547417256
09/23/2023 14:24:33 - INFO - __main__ -    global_step = 1500, average loss = 0.1286695548933858
09/23/2023 14:24:34 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 14:24:34 - INFO - __main__ -     Num examples = 10000
09/23/2023 14:24:34 - INFO - __main__ -     Batch size = 16
09/23/2023 14:28:29 - INFO - __main__ -   ***** Eval results *****
09/23/2023 14:28:29 - INFO - __main__ -     acc = 0.8048
09/23/2023 14:28:56 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 14:32:42 - INFO - __main__ -    global_step = 1550, average loss = 0.1198868363915244
09/23/2023 14:36:24 - INFO - __main__ -    global_step = 1600, average loss = 0.12324378551486007
09/23/2023 14:40:00 - INFO - __main__ -    global_step = 1650, average loss = 0.11938468464672042
09/23/2023 14:43:41 - INFO - __main__ -    global_step = 1700, average loss = 0.14236379045556533
09/23/2023 14:47:22 - INFO - __main__ -    global_step = 1750, average loss = 0.13320694023670512
09/23/2023 14:51:02 - INFO - __main__ -    global_step = 1800, average loss = 0.13622453257718006
09/23/2023 14:54:42 - INFO - __main__ -    global_step = 1850, average loss = 0.13987649206645072
09/23/2023 14:58:22 - INFO - __main__ -    global_step = 1900, average loss = 0.12299754774277971
09/23/2023 15:02:05 - INFO - __main__ -    global_step = 1950, average loss = 0.11868109124743569
09/23/2023 15:05:47 - INFO - __main__ -    global_step = 2000, average loss = 0.1415042275990345
09/23/2023 15:05:47 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 15:05:47 - INFO - __main__ -     Num examples = 10000
09/23/2023 15:05:47 - INFO - __main__ -     Batch size = 16
09/23/2023 15:09:43 - INFO - __main__ -   ***** Eval results *****
09/23/2023 15:09:43 - INFO - __main__ -     acc = 0.8063
09/23/2023 15:10:10 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 15:13:51 - INFO - __main__ -    global_step = 2050, average loss = 0.11399275673671581
09/23/2023 15:17:31 - INFO - __main__ -    global_step = 2100, average loss = 0.1065546132405143
09/23/2023 15:21:11 - INFO - __main__ -    global_step = 2150, average loss = 0.12809142941467144
09/23/2023 15:24:51 - INFO - __main__ -    global_step = 2200, average loss = 0.12454848410692648
09/23/2023 15:28:34 - INFO - __main__ -    global_step = 2250, average loss = 0.10986286829065647
09/23/2023 15:32:14 - INFO - __main__ -    global_step = 2300, average loss = 0.11237965747121052
09/23/2023 15:35:56 - INFO - __main__ -    global_step = 2350, average loss = 0.10897610924319451
09/23/2023 15:39:41 - INFO - __main__ -    global_step = 2400, average loss = 0.12056981857070241
09/23/2023 15:43:24 - INFO - __main__ -    global_step = 2450, average loss = 0.13911059297635803
09/23/2023 15:47:10 - INFO - __main__ -    global_step = 2500, average loss = 0.11335444856034883
09/23/2023 15:47:10 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 15:47:10 - INFO - __main__ -     Num examples = 10000
09/23/2023 15:47:10 - INFO - __main__ -     Batch size = 16
09/23/2023 15:51:06 - INFO - __main__ -   ***** Eval results *****
09/23/2023 15:51:06 - INFO - __main__ -     acc = 0.8234
09/23/2023 15:51:32 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 15:55:10 - INFO - __main__ -    global_step = 2550, average loss = 0.12103958850973867
09/23/2023 15:58:57 - INFO - __main__ -    global_step = 2600, average loss = 0.11913071399074397
09/23/2023 16:02:38 - INFO - __main__ -    global_step = 2650, average loss = 0.11255583499452769
09/23/2023 16:06:28 - INFO - __main__ -    global_step = 2700, average loss = 0.1006322616293619
09/23/2023 16:10:12 - INFO - __main__ -    global_step = 2750, average loss = 0.0932968783121487
09/23/2023 16:13:51 - INFO - __main__ -    global_step = 2800, average loss = 0.11056979637924087
09/23/2023 16:17:38 - INFO - __main__ -    global_step = 2850, average loss = 0.12318793082176853
09/23/2023 16:21:21 - INFO - __main__ -    global_step = 2900, average loss = 0.10864610994302439
09/23/2023 16:25:03 - INFO - __main__ -    global_step = 2950, average loss = 0.11261582636667299
09/23/2023 16:28:40 - INFO - __main__ -    global_step = 3000, average loss = 0.12150005620278534
09/23/2023 16:28:40 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 16:28:40 - INFO - __main__ -     Num examples = 10000
09/23/2023 16:28:40 - INFO - __main__ -     Batch size = 16
09/23/2023 16:32:35 - INFO - __main__ -   ***** Eval results *****
09/23/2023 16:32:35 - INFO - __main__ -     acc = 0.8261
09/23/2023 16:33:02 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 16:36:46 - INFO - __main__ -    global_step = 3050, average loss = 0.10565035182957218
09/23/2023 16:40:30 - INFO - __main__ -    global_step = 3100, average loss = 0.10429829731896462
09/23/2023 16:44:14 - INFO - __main__ -    global_step = 3150, average loss = 0.10812272985053824
09/23/2023 16:47:54 - INFO - __main__ -    global_step = 3200, average loss = 0.12238092143270478
09/23/2023 16:51:33 - INFO - __main__ -    global_step = 3250, average loss = 0.10868940783606376
09/23/2023 16:55:14 - INFO - __main__ -    global_step = 3300, average loss = 0.1209917226509424
09/23/2023 16:58:59 - INFO - __main__ -    global_step = 3350, average loss = 0.1191260662042896
09/23/2023 17:02:41 - INFO - __main__ -    global_step = 3400, average loss = 0.1174743126919202
09/23/2023 17:06:26 - INFO - __main__ -    global_step = 3450, average loss = 0.100895225374843
09/23/2023 17:10:02 - INFO - __main__ -    global_step = 3500, average loss = 0.0931866138278565
09/23/2023 17:10:03 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 17:10:03 - INFO - __main__ -     Num examples = 10000
09/23/2023 17:10:03 - INFO - __main__ -     Batch size = 16
09/23/2023 17:13:58 - INFO - __main__ -   ***** Eval results *****
09/23/2023 17:13:58 - INFO - __main__ -     acc = 0.8229
09/23/2023 17:17:45 - INFO - __main__ -    global_step = 3550, average loss = 0.10633477224648231
09/23/2023 17:21:30 - INFO - __main__ -    global_step = 3600, average loss = 0.1021722938354651
09/23/2023 17:25:11 - INFO - __main__ -    global_step = 3650, average loss = 0.10295378862727375
09/23/2023 17:28:50 - INFO - __main__ -    global_step = 3700, average loss = 0.1024187771679135
09/23/2023 17:32:34 - INFO - __main__ -    global_step = 3750, average loss = 0.09922411829451448
09/23/2023 17:36:14 - INFO - __main__ -    global_step = 3800, average loss = 0.11105157318372222
09/23/2023 17:39:57 - INFO - __main__ -    global_step = 3850, average loss = 0.12378941989987652
09/23/2023 17:43:42 - INFO - __main__ -    global_step = 3900, average loss = 0.1034327056143593
09/23/2023 17:47:25 - INFO - __main__ -    global_step = 3950, average loss = 0.09697925167827634
09/23/2023 17:51:09 - INFO - __main__ -    global_step = 4000, average loss = 0.11230336717126192
09/23/2023 17:51:09 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 17:51:09 - INFO - __main__ -     Num examples = 10000
09/23/2023 17:51:09 - INFO - __main__ -     Batch size = 16
09/23/2023 17:55:05 - INFO - __main__ -   ***** Eval results *****
09/23/2023 17:55:05 - INFO - __main__ -     acc = 0.8371
09/23/2023 17:55:32 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 17:59:12 - INFO - __main__ -    global_step = 4050, average loss = 0.10925351051962934
09/23/2023 18:03:00 - INFO - __main__ -    global_step = 4100, average loss = 0.09795216493275802
09/23/2023 18:06:43 - INFO - __main__ -    global_step = 4150, average loss = 0.09962472554965643
09/23/2023 18:10:25 - INFO - __main__ -    global_step = 4200, average loss = 0.10342389734141762
09/23/2023 18:14:05 - INFO - __main__ -    global_step = 4250, average loss = 0.09674815248567029
09/23/2023 18:17:48 - INFO - __main__ -    global_step = 4300, average loss = 0.10319628210134396
09/23/2023 18:21:33 - INFO - __main__ -    global_step = 4350, average loss = 0.09340641272166977
09/23/2023 18:25:14 - INFO - __main__ -    global_step = 4400, average loss = 0.10845618240913608
09/23/2023 18:28:59 - INFO - __main__ -    global_step = 4450, average loss = 0.11604906246473547
09/23/2023 18:32:43 - INFO - __main__ -    global_step = 4500, average loss = 0.09590314964269055
09/23/2023 18:32:43 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 18:32:43 - INFO - __main__ -     Num examples = 10000
09/23/2023 18:32:43 - INFO - __main__ -     Batch size = 16
09/23/2023 18:36:38 - INFO - __main__ -   ***** Eval results *****
09/23/2023 18:36:38 - INFO - __main__ -     acc = 0.8305
09/23/2023 18:40:22 - INFO - __main__ -    global_step = 4550, average loss = 0.09955280199857952
09/23/2023 18:44:07 - INFO - __main__ -    global_step = 4600, average loss = 0.09018894311768236
09/23/2023 18:47:49 - INFO - __main__ -    global_step = 4650, average loss = 0.11624654464081687
09/23/2023 18:51:30 - INFO - __main__ -    global_step = 4700, average loss = 0.11213955332923434
09/23/2023 18:55:07 - INFO - __main__ -    global_step = 4750, average loss = 0.11335175217776851
09/23/2023 18:58:47 - INFO - __main__ -    global_step = 4800, average loss = 0.10374061681199237
09/23/2023 19:02:34 - INFO - __main__ -    global_step = 4850, average loss = 0.09650620453016018
09/23/2023 19:06:16 - INFO - __main__ -    global_step = 4900, average loss = 0.1034209698169434
09/23/2023 19:09:53 - INFO - __main__ -    global_step = 4950, average loss = 0.10046588191311458
09/23/2023 19:13:34 - INFO - __main__ -    global_step = 5000, average loss = 0.10752027794980677
09/23/2023 19:13:34 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 19:13:34 - INFO - __main__ -     Num examples = 10000
09/23/2023 19:13:34 - INFO - __main__ -     Batch size = 16
09/23/2023 19:17:29 - INFO - __main__ -   ***** Eval results *****
09/23/2023 19:17:29 - INFO - __main__ -     acc = 0.8355
09/23/2023 19:21:19 - INFO - __main__ -    global_step = 5050, average loss = 0.10195030277842307
09/23/2023 19:24:58 - INFO - __main__ -    global_step = 5100, average loss = 0.10987481483532065
09/23/2023 19:28:41 - INFO - __main__ -    global_step = 5150, average loss = 0.10906005093554995
09/23/2023 19:32:23 - INFO - __main__ -    global_step = 5200, average loss = 0.09835696181547973
09/23/2023 19:36:06 - INFO - __main__ -    global_step = 5250, average loss = 0.10181126694624254
09/23/2023 19:39:52 - INFO - __main__ -    global_step = 5300, average loss = 0.08663028705283068
09/23/2023 19:43:30 - INFO - __main__ -    global_step = 5350, average loss = 0.10507196654667496
09/23/2023 19:47:18 - INFO - __main__ -    global_step = 5400, average loss = 0.108608085659871
09/23/2023 19:51:03 - INFO - __main__ -    global_step = 5450, average loss = 0.099619501844536
09/23/2023 19:54:49 - INFO - __main__ -    global_step = 5500, average loss = 0.10225338533447939
09/23/2023 19:54:49 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 19:54:49 - INFO - __main__ -     Num examples = 10000
09/23/2023 19:54:49 - INFO - __main__ -     Batch size = 16
09/23/2023 19:58:45 - INFO - __main__ -   ***** Eval results *****
09/23/2023 19:58:45 - INFO - __main__ -     acc = 0.8279
09/23/2023 20:02:26 - INFO - __main__ -    global_step = 5550, average loss = 0.10436682683890468
09/23/2023 20:06:11 - INFO - __main__ -    global_step = 5600, average loss = 0.10477761221260153
09/23/2023 20:09:52 - INFO - __main__ -    global_step = 5650, average loss = 0.09326410317778937
09/23/2023 20:13:31 - INFO - __main__ -    global_step = 5700, average loss = 0.11269167278223904
09/23/2023 20:17:16 - INFO - __main__ -    global_step = 5750, average loss = 0.10188864256499074
09/23/2023 20:21:00 - INFO - __main__ -    global_step = 5800, average loss = 0.10433580860199981
09/23/2023 20:24:43 - INFO - __main__ -    global_step = 5850, average loss = 0.08972063858884212
09/23/2023 20:28:22 - INFO - __main__ -    global_step = 5900, average loss = 0.1065664726671821
09/23/2023 20:32:07 - INFO - __main__ -    global_step = 5950, average loss = 0.10174332244623656
09/23/2023 20:35:49 - INFO - __main__ -    global_step = 6000, average loss = 0.08872646622621687
09/23/2023 20:35:49 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 20:35:49 - INFO - __main__ -     Num examples = 10000
09/23/2023 20:35:49 - INFO - __main__ -     Batch size = 16
09/23/2023 20:39:45 - INFO - __main__ -   ***** Eval results *****
09/23/2023 20:39:45 - INFO - __main__ -     acc = 0.8363
09/23/2023 20:43:29 - INFO - __main__ -    global_step = 6050, average loss = 0.10705330887685705
09/23/2023 20:47:16 - INFO - __main__ -    global_step = 6100, average loss = 0.09171272950654384
09/23/2023 20:50:59 - INFO - __main__ -    global_step = 6150, average loss = 0.0861645900901567
09/23/2023 20:54:46 - INFO - __main__ -    global_step = 6200, average loss = 0.08994678908144124
09/23/2023 20:58:32 - INFO - __main__ -    global_step = 6250, average loss = 0.08786970607354305
09/23/2023 21:02:13 - INFO - __main__ -    global_step = 6300, average loss = 0.09656520821336016
09/23/2023 21:05:56 - INFO - __main__ -    global_step = 6350, average loss = 0.09620310332989902
09/23/2023 21:09:42 - INFO - __main__ -    global_step = 6400, average loss = 0.09152124080545036
09/23/2023 21:13:22 - INFO - __main__ -    global_step = 6450, average loss = 0.09472263304131047
09/23/2023 21:17:06 - INFO - __main__ -    global_step = 6500, average loss = 0.10554198697194807
09/23/2023 21:17:06 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 21:17:06 - INFO - __main__ -     Num examples = 10000
09/23/2023 21:17:06 - INFO - __main__ -     Batch size = 16
09/23/2023 21:21:01 - INFO - __main__ -   ***** Eval results *****
09/23/2023 21:21:01 - INFO - __main__ -     acc = 0.841
09/23/2023 21:21:28 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 21:25:14 - INFO - __main__ -    global_step = 6550, average loss = 0.09830655160796596
09/23/2023 21:28:55 - INFO - __main__ -    global_step = 6600, average loss = 0.09539545015402837
09/23/2023 21:32:40 - INFO - __main__ -    global_step = 6650, average loss = 0.09118585625503328
09/23/2023 21:36:18 - INFO - __main__ -    global_step = 6700, average loss = 0.09700520555491493
09/23/2023 21:40:03 - INFO - __main__ -    global_step = 6750, average loss = 0.105271778342576
09/23/2023 21:43:45 - INFO - __main__ -    global_step = 6800, average loss = 0.10975144471223758
09/23/2023 21:47:28 - INFO - __main__ -    global_step = 6850, average loss = 0.09920243133579788
09/23/2023 21:51:11 - INFO - __main__ -    global_step = 6900, average loss = 0.09791661702009151
09/23/2023 21:54:51 - INFO - __main__ -    global_step = 6950, average loss = 0.08630025177910283
09/23/2023 21:58:29 - INFO - __main__ -    global_step = 7000, average loss = 0.09660528897402401
09/23/2023 21:58:29 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 21:58:29 - INFO - __main__ -     Num examples = 10000
09/23/2023 21:58:29 - INFO - __main__ -     Batch size = 16
09/23/2023 22:02:25 - INFO - __main__ -   ***** Eval results *****
09/23/2023 22:02:25 - INFO - __main__ -     acc = 0.843
09/23/2023 22:02:51 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 22:06:33 - INFO - __main__ -    global_step = 7050, average loss = 0.10305566756385814
09/23/2023 22:10:07 - INFO - __main__ -    global_step = 7100, average loss = 0.10687436608219286
09/23/2023 22:13:47 - INFO - __main__ -    global_step = 7150, average loss = 0.0946133067667688
09/23/2023 22:17:27 - INFO - __main__ -    global_step = 7200, average loss = 0.09795189084834419
09/23/2023 22:21:17 - INFO - __main__ -    global_step = 7250, average loss = 0.09060888570308634
09/23/2023 22:24:59 - INFO - __main__ -    global_step = 7300, average loss = 0.0877145413684775
09/23/2023 22:28:35 - INFO - __main__ -    global_step = 7350, average loss = 0.10495714643941029
09/23/2023 22:32:21 - INFO - __main__ -    global_step = 7400, average loss = 0.07401456630654138
09/23/2023 22:36:03 - INFO - __main__ -    global_step = 7450, average loss = 0.09523518772701209
09/23/2023 22:39:41 - INFO - __main__ -    global_step = 7500, average loss = 0.10137952610446518
09/23/2023 22:39:41 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 22:39:41 - INFO - __main__ -     Num examples = 10000
09/23/2023 22:39:41 - INFO - __main__ -     Batch size = 16
09/23/2023 22:43:37 - INFO - __main__ -   ***** Eval results *****
09/23/2023 22:43:37 - INFO - __main__ -     acc = 0.846
09/23/2023 22:44:03 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 22:47:46 - INFO - __main__ -    global_step = 7550, average loss = 0.09563293447645264
09/23/2023 22:51:31 - INFO - __main__ -    global_step = 7600, average loss = 0.09618103489105125
09/23/2023 22:55:13 - INFO - __main__ -    global_step = 7650, average loss = 0.08849806944810552
09/23/2023 22:58:54 - INFO - __main__ -    global_step = 7700, average loss = 0.10007433392238455
09/23/2023 23:02:36 - INFO - __main__ -    global_step = 7750, average loss = 0.09035434001329122
09/23/2023 23:06:24 - INFO - __main__ -    global_step = 7800, average loss = 0.09338357288788757
09/23/2023 23:10:04 - INFO - __main__ -    global_step = 7850, average loss = 0.09912064949181514
09/23/2023 23:13:47 - INFO - __main__ -    global_step = 7900, average loss = 0.08827902228244057
09/23/2023 23:17:27 - INFO - __main__ -    global_step = 7950, average loss = 0.11218067690118914
09/23/2023 23:21:09 - INFO - __main__ -    global_step = 8000, average loss = 0.08588292430682486
09/23/2023 23:21:09 - INFO - __main__ -   ***** Running evaluation *****
09/23/2023 23:21:09 - INFO - __main__ -     Num examples = 10000
09/23/2023 23:21:09 - INFO - __main__ -     Batch size = 16
09/23/2023 23:25:05 - INFO - __main__ -   ***** Eval results *****
09/23/2023 23:25:05 - INFO - __main__ -     acc = 0.8472
09/23/2023 23:25:31 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/23/2023 23:29:08 - INFO - __main__ -    global_step = 8050, average loss = 0.09245043838061974
09/23/2023 23:32:54 - INFO - __main__ -    global_step = 8100, average loss = 0.08283289226481429
09/23/2023 23:36:34 - INFO - __main__ -    global_step = 8150, average loss = 0.08407623038449856
09/23/2023 23:40:17 - INFO - __main__ -    global_step = 8200, average loss = 0.09736820162237564
09/23/2023 23:44:06 - INFO - __main__ -    global_step = 8250, average loss = 0.08463705457368632
09/23/2023 23:47:50 - INFO - __main__ -    global_step = 8300, average loss = 0.10010304888644896
09/23/2023 23:51:35 - INFO - __main__ -    global_step = 8350, average loss = 0.09222401980725409
09/23/2023 23:55:17 - INFO - __main__ -    global_step = 8400, average loss = 0.08634746881416504
09/23/2023 23:58:59 - INFO - __main__ -    global_step = 8450, average loss = 0.08723288500368653
09/24/2023 00:02:37 - INFO - __main__ -    global_step = 8500, average loss = 0.10130320921433394
09/24/2023 00:02:37 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 00:02:37 - INFO - __main__ -     Num examples = 10000
09/24/2023 00:02:37 - INFO - __main__ -     Batch size = 16
09/24/2023 00:06:32 - INFO - __main__ -   ***** Eval results *****
09/24/2023 00:06:32 - INFO - __main__ -     acc = 0.8452
09/24/2023 00:10:13 - INFO - __main__ -    global_step = 8550, average loss = 0.0889340414837352
09/24/2023 00:13:53 - INFO - __main__ -    global_step = 8600, average loss = 0.0960574367789377
09/24/2023 00:17:37 - INFO - __main__ -    global_step = 8650, average loss = 0.07860265792332939
09/24/2023 00:21:20 - INFO - __main__ -    global_step = 8700, average loss = 0.09233207383847912
09/24/2023 00:25:05 - INFO - __main__ -    global_step = 8750, average loss = 0.09803196908305836
09/24/2023 00:28:44 - INFO - __main__ -    global_step = 8800, average loss = 0.08913468146740343
09/24/2023 00:32:26 - INFO - __main__ -    global_step = 8850, average loss = 0.0880054514182666
09/24/2023 00:36:11 - INFO - __main__ -    global_step = 8900, average loss = 0.0839999437017832
09/24/2023 00:39:52 - INFO - __main__ -    global_step = 8950, average loss = 0.10094311676693905
09/24/2023 00:43:32 - INFO - __main__ -    global_step = 9000, average loss = 0.10011614485312748
09/24/2023 00:43:32 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 00:43:32 - INFO - __main__ -     Num examples = 10000
09/24/2023 00:43:32 - INFO - __main__ -     Batch size = 16
09/24/2023 00:47:27 - INFO - __main__ -   ***** Eval results *****
09/24/2023 00:47:27 - INFO - __main__ -     acc = 0.8463
09/24/2023 00:51:10 - INFO - __main__ -    global_step = 9050, average loss = 0.09407024829903093
09/24/2023 00:54:48 - INFO - __main__ -    global_step = 9100, average loss = 0.09510339217069032
09/24/2023 00:58:27 - INFO - __main__ -    global_step = 9150, average loss = 0.09413513723055075
09/24/2023 01:02:10 - INFO - __main__ -    global_step = 9200, average loss = 0.08488880819528276
09/24/2023 01:05:47 - INFO - __main__ -    global_step = 9250, average loss = 0.09847264970565447
09/24/2023 01:09:28 - INFO - __main__ -    global_step = 9300, average loss = 0.08640140883806452
09/24/2023 01:13:08 - INFO - __main__ -    global_step = 9350, average loss = 0.07884123000112594
09/24/2023 01:16:54 - INFO - __main__ -    global_step = 9400, average loss = 0.0831154512307694
09/24/2023 01:20:32 - INFO - __main__ -    global_step = 9450, average loss = 0.09913980022422038
09/24/2023 01:24:11 - INFO - __main__ -    global_step = 9500, average loss = 0.09805536182444484
09/24/2023 01:24:11 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 01:24:11 - INFO - __main__ -     Num examples = 10000
09/24/2023 01:24:11 - INFO - __main__ -     Batch size = 16
09/24/2023 01:28:07 - INFO - __main__ -   ***** Eval results *****
09/24/2023 01:28:07 - INFO - __main__ -     acc = 0.8463
09/24/2023 01:31:55 - INFO - __main__ -    global_step = 9550, average loss = 0.0912455873134968
09/24/2023 01:35:38 - INFO - __main__ -    global_step = 9600, average loss = 0.10278063782119716
09/24/2023 01:39:12 - INFO - __main__ -    global_step = 9650, average loss = 0.08788584528032516
09/24/2023 01:42:53 - INFO - __main__ -    global_step = 9700, average loss = 0.08058010207216285
09/24/2023 01:46:34 - INFO - __main__ -    global_step = 9750, average loss = 0.08765123128723644
09/24/2023 01:50:14 - INFO - __main__ -    global_step = 9800, average loss = 0.09005017607181799
09/24/2023 01:54:03 - INFO - __main__ -    global_step = 9850, average loss = 0.07892634223760979
09/24/2023 01:57:44 - INFO - __main__ -    global_step = 9900, average loss = 0.07999062808303278
09/24/2023 02:01:26 - INFO - __main__ -    global_step = 9950, average loss = 0.09494447313452838
09/24/2023 02:05:06 - INFO - __main__ -    global_step = 10000, average loss = 0.0841888710015337
09/24/2023 02:05:06 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 02:05:06 - INFO - __main__ -     Num examples = 10000
09/24/2023 02:05:06 - INFO - __main__ -     Batch size = 16
09/24/2023 02:09:01 - INFO - __main__ -   ***** Eval results *****
09/24/2023 02:09:01 - INFO - __main__ -     acc = 0.8471
09/24/2023 02:12:40 - INFO - __main__ -    global_step = 10050, average loss = 0.08929907138342968
09/24/2023 02:16:20 - INFO - __main__ -    global_step = 10100, average loss = 0.10172551687661326
09/24/2023 02:20:00 - INFO - __main__ -    global_step = 10150, average loss = 0.09577305402533966
09/24/2023 02:23:46 - INFO - __main__ -    global_step = 10200, average loss = 0.09480085656211486
09/24/2023 02:27:27 - INFO - __main__ -    global_step = 10250, average loss = 0.07956519629078684
09/24/2023 02:31:05 - INFO - __main__ -    global_step = 10300, average loss = 0.08291967767250753
09/24/2023 02:34:47 - INFO - __main__ -    global_step = 10350, average loss = 0.09592102762369904
09/24/2023 02:38:29 - INFO - __main__ -    global_step = 10400, average loss = 0.08570889301292482
09/24/2023 02:42:13 - INFO - __main__ -    global_step = 10450, average loss = 0.07362440132081247
09/24/2023 02:45:58 - INFO - __main__ -    global_step = 10500, average loss = 0.08574875552483718
09/24/2023 02:45:58 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 02:45:58 - INFO - __main__ -     Num examples = 10000
09/24/2023 02:45:58 - INFO - __main__ -     Batch size = 16
09/24/2023 02:49:53 - INFO - __main__ -   ***** Eval results *****
09/24/2023 02:49:53 - INFO - __main__ -     acc = 0.8524
09/24/2023 02:50:20 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 02:54:03 - INFO - __main__ -    global_step = 10550, average loss = 0.08846153970320302
09/24/2023 02:57:43 - INFO - __main__ -    global_step = 10600, average loss = 0.08381684645668429
09/24/2023 03:01:26 - INFO - __main__ -    global_step = 10650, average loss = 0.09288432469184045
09/24/2023 03:05:08 - INFO - __main__ -    global_step = 10700, average loss = 0.08199916316298186
09/24/2023 03:08:56 - INFO - __main__ -    global_step = 10750, average loss = 0.09068042659768252
09/24/2023 03:12:37 - INFO - __main__ -    global_step = 10800, average loss = 0.08719110449641448
09/24/2023 03:16:20 - INFO - __main__ -    global_step = 10850, average loss = 0.09036207084544003
09/24/2023 03:20:04 - INFO - __main__ -    global_step = 10900, average loss = 0.095746248819637
09/24/2023 03:23:45 - INFO - __main__ -    global_step = 10950, average loss = 0.1019882604497252
09/24/2023 03:27:25 - INFO - __main__ -    global_step = 11000, average loss = 0.08660416512644588
09/24/2023 03:27:25 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 03:27:25 - INFO - __main__ -     Num examples = 10000
09/24/2023 03:27:25 - INFO - __main__ -     Batch size = 16
09/24/2023 03:31:21 - INFO - __main__ -   ***** Eval results *****
09/24/2023 03:31:21 - INFO - __main__ -     acc = 0.8521
09/24/2023 03:35:00 - INFO - __main__ -    global_step = 11050, average loss = 0.07959849048202158
09/24/2023 03:38:42 - INFO - __main__ -    global_step = 11100, average loss = 0.08480279741248524
09/24/2023 03:42:25 - INFO - __main__ -    global_step = 11150, average loss = 0.07940411141982623
09/24/2023 03:46:06 - INFO - __main__ -    global_step = 11200, average loss = 0.08627346496621613
09/24/2023 03:49:48 - INFO - __main__ -    global_step = 11250, average loss = 0.08515130840663915
09/24/2023 03:53:28 - INFO - __main__ -    global_step = 11300, average loss = 0.08047833000106039
09/24/2023 03:57:07 - INFO - __main__ -    global_step = 11350, average loss = 0.08884227124826338
09/24/2023 04:00:47 - INFO - __main__ -    global_step = 11400, average loss = 0.09542614945773494
09/24/2023 04:04:26 - INFO - __main__ -    global_step = 11450, average loss = 0.08332637125422479
09/24/2023 04:08:07 - INFO - __main__ -    global_step = 11500, average loss = 0.09769482501476887
09/24/2023 04:08:07 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 04:08:07 - INFO - __main__ -     Num examples = 10000
09/24/2023 04:08:07 - INFO - __main__ -     Batch size = 16
09/24/2023 04:12:02 - INFO - __main__ -   ***** Eval results *****
09/24/2023 04:12:02 - INFO - __main__ -     acc = 0.851
09/24/2023 04:15:51 - INFO - __main__ -    global_step = 11550, average loss = 0.09137944790694746
09/24/2023 04:19:38 - INFO - __main__ -    global_step = 11600, average loss = 0.07454582622590351
09/24/2023 04:23:20 - INFO - __main__ -    global_step = 11650, average loss = 0.08284565404814202
09/24/2023 04:26:59 - INFO - __main__ -    global_step = 11700, average loss = 0.0969824349215196
09/24/2023 04:30:41 - INFO - __main__ -    global_step = 11750, average loss = 0.09389037321489013
09/24/2023 04:34:23 - INFO - __main__ -    global_step = 11800, average loss = 0.08608788483528769
09/24/2023 04:38:05 - INFO - __main__ -    global_step = 11850, average loss = 0.09322659247220144
09/24/2023 04:41:49 - INFO - __main__ -    global_step = 11900, average loss = 0.09286965438863262
09/24/2023 04:45:31 - INFO - __main__ -    global_step = 11950, average loss = 0.08214385434631367
09/24/2023 04:49:12 - INFO - __main__ -    global_step = 12000, average loss = 0.09392224536069989
09/24/2023 04:49:12 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 04:49:12 - INFO - __main__ -     Num examples = 10000
09/24/2023 04:49:12 - INFO - __main__ -     Batch size = 16
09/24/2023 04:53:07 - INFO - __main__ -   ***** Eval results *****
09/24/2023 04:53:07 - INFO - __main__ -     acc = 0.8514
09/24/2023 04:56:53 - INFO - __main__ -    global_step = 12050, average loss = 0.08019034011129406
09/24/2023 05:00:34 - INFO - __main__ -    global_step = 12100, average loss = 0.08210711618239656
09/24/2023 05:04:16 - INFO - __main__ -    global_step = 12150, average loss = 0.08764273267355747
09/24/2023 05:08:02 - INFO - __main__ -    global_step = 12200, average loss = 0.08758470895321807
09/24/2023 05:11:48 - INFO - __main__ -    global_step = 12250, average loss = 0.07766548367973883
09/24/2023 05:15:27 - INFO - __main__ -    global_step = 12300, average loss = 0.08148344823415755
09/24/2023 05:19:08 - INFO - __main__ -    global_step = 12350, average loss = 0.08814196670609817
09/24/2023 05:22:50 - INFO - __main__ -    global_step = 12400, average loss = 0.08936668847491092
09/24/2023 05:26:29 - INFO - __main__ -    global_step = 12450, average loss = 0.08240065188347216
09/24/2023 05:30:12 - INFO - __main__ -    global_step = 12500, average loss = 0.08683115135392655
09/24/2023 05:30:12 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 05:30:12 - INFO - __main__ -     Num examples = 10000
09/24/2023 05:30:12 - INFO - __main__ -     Batch size = 16
09/24/2023 05:34:07 - INFO - __main__ -   ***** Eval results *****
09/24/2023 05:34:07 - INFO - __main__ -     acc = 0.8515
09/24/2023 05:37:53 - INFO - __main__ -    global_step = 12550, average loss = 0.08871277472944712
09/24/2023 05:41:34 - INFO - __main__ -    global_step = 12600, average loss = 0.08797626828309149
09/24/2023 05:45:11 - INFO - __main__ -    global_step = 12650, average loss = 0.10095825259459616
09/24/2023 05:48:58 - INFO - __main__ -    global_step = 12700, average loss = 0.07953012495926487
09/24/2023 05:52:41 - INFO - __main__ -    global_step = 12750, average loss = 0.08843418272979761
09/24/2023 05:56:19 - INFO - __main__ -    global_step = 12800, average loss = 0.07413991435227217
09/24/2023 05:59:59 - INFO - __main__ -    global_step = 12850, average loss = 0.07519575585451094
09/24/2023 06:03:48 - INFO - __main__ -    global_step = 12900, average loss = 0.08996981896292709
09/24/2023 06:07:28 - INFO - __main__ -    global_step = 12950, average loss = 0.08996171029284597
09/24/2023 06:11:11 - INFO - __main__ -    global_step = 13000, average loss = 0.08077499923689174
09/24/2023 06:11:11 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 06:11:11 - INFO - __main__ -     Num examples = 10000
09/24/2023 06:11:11 - INFO - __main__ -     Batch size = 16
09/24/2023 06:15:06 - INFO - __main__ -   ***** Eval results *****
09/24/2023 06:15:06 - INFO - __main__ -     acc = 0.8527
09/24/2023 06:15:33 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 06:19:13 - INFO - __main__ -    global_step = 13050, average loss = 0.08447560470420284
09/24/2023 06:22:54 - INFO - __main__ -    global_step = 13100, average loss = 0.08299598100831646
09/24/2023 06:26:32 - INFO - __main__ -    global_step = 13150, average loss = 0.08393764879734135
09/24/2023 06:30:08 - INFO - __main__ -    global_step = 13200, average loss = 0.09848508099505125
09/24/2023 06:33:47 - INFO - __main__ -    global_step = 13250, average loss = 0.09162080157435412
09/24/2023 06:37:28 - INFO - __main__ -    global_step = 13300, average loss = 0.0914362099875143
09/24/2023 06:41:09 - INFO - __main__ -    global_step = 13350, average loss = 0.07781068138462616
09/24/2023 06:44:55 - INFO - __main__ -    global_step = 13400, average loss = 0.08868030074576382
09/24/2023 06:48:36 - INFO - __main__ -    global_step = 13450, average loss = 0.08357623873533157
09/24/2023 06:52:18 - INFO - __main__ -    global_step = 13500, average loss = 0.08828085365807055
09/24/2023 06:52:18 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 06:52:18 - INFO - __main__ -     Num examples = 10000
09/24/2023 06:52:18 - INFO - __main__ -     Batch size = 16
09/24/2023 06:56:14 - INFO - __main__ -   ***** Eval results *****
09/24/2023 06:56:14 - INFO - __main__ -     acc = 0.8499
09/24/2023 06:59:57 - INFO - __main__ -    global_step = 13550, average loss = 0.08140521681067185
09/24/2023 07:03:37 - INFO - __main__ -    global_step = 13600, average loss = 0.08341409597109305
09/24/2023 07:07:17 - INFO - __main__ -    global_step = 13650, average loss = 0.08142950747031136
09/24/2023 07:10:56 - INFO - __main__ -    global_step = 13700, average loss = 0.09089667504686076
09/24/2023 07:14:45 - INFO - __main__ -    global_step = 13750, average loss = 0.07177684095106088
09/24/2023 07:18:24 - INFO - __main__ -    global_step = 13800, average loss = 0.08592368463818274
09/24/2023 07:22:01 - INFO - __main__ -    global_step = 13850, average loss = 0.08120634569131653
09/24/2023 07:25:48 - INFO - __main__ -    global_step = 13900, average loss = 0.08909589071197843
09/24/2023 07:29:30 - INFO - __main__ -    global_step = 13950, average loss = 0.08629100337015189
09/24/2023 07:33:10 - INFO - __main__ -    global_step = 14000, average loss = 0.07722124511306902
09/24/2023 07:33:10 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 07:33:10 - INFO - __main__ -     Num examples = 10000
09/24/2023 07:33:10 - INFO - __main__ -     Batch size = 16
09/24/2023 07:37:05 - INFO - __main__ -   ***** Eval results *****
09/24/2023 07:37:05 - INFO - __main__ -     acc = 0.8533
09/24/2023 07:37:32 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 07:41:11 - INFO - __main__ -    global_step = 14050, average loss = 0.08182521525057382
09/24/2023 07:44:48 - INFO - __main__ -    global_step = 14100, average loss = 0.0902410151962249
09/24/2023 07:48:28 - INFO - __main__ -    global_step = 14150, average loss = 0.07409664937826164
09/24/2023 07:52:12 - INFO - __main__ -    global_step = 14200, average loss = 0.08879891355274594
09/24/2023 07:55:53 - INFO - __main__ -    global_step = 14250, average loss = 0.09268313445325475
09/24/2023 07:59:30 - INFO - __main__ -    global_step = 14300, average loss = 0.08798344542199629
09/24/2023 08:03:13 - INFO - __main__ -    global_step = 14350, average loss = 0.09607475698139752
09/24/2023 08:06:59 - INFO - __main__ -    global_step = 14400, average loss = 0.07222031111843535
09/24/2023 08:10:40 - INFO - __main__ -    global_step = 14450, average loss = 0.07480319764195884
09/24/2023 08:14:19 - INFO - __main__ -    global_step = 14500, average loss = 0.0838716509303049
09/24/2023 08:14:19 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 08:14:19 - INFO - __main__ -     Num examples = 10000
09/24/2023 08:14:19 - INFO - __main__ -     Batch size = 16
09/24/2023 08:18:16 - INFO - __main__ -   ***** Eval results *****
09/24/2023 08:18:16 - INFO - __main__ -     acc = 0.8542
09/24/2023 08:18:42 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 08:22:18 - INFO - __main__ -    global_step = 14550, average loss = 0.08034001361316769
09/24/2023 08:25:55 - INFO - __main__ -    global_step = 14600, average loss = 0.07689567271547276
09/24/2023 08:29:37 - INFO - __main__ -    global_step = 14650, average loss = 0.09093381941405823
09/24/2023 08:33:25 - INFO - __main__ -    global_step = 14700, average loss = 0.07569706412876258
09/24/2023 08:37:04 - INFO - __main__ -    global_step = 14750, average loss = 0.07479940189456101
09/24/2023 08:40:47 - INFO - __main__ -    global_step = 14800, average loss = 0.08522207450543647
09/24/2023 08:44:34 - INFO - __main__ -    global_step = 14850, average loss = 0.0889268495763099
09/24/2023 08:48:16 - INFO - __main__ -    global_step = 14900, average loss = 0.08616152721479012
09/24/2023 08:51:56 - INFO - __main__ -    global_step = 14950, average loss = 0.07867321850848384
09/24/2023 08:55:39 - INFO - __main__ -    global_step = 15000, average loss = 0.08426695556714549
09/24/2023 08:55:39 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 08:55:39 - INFO - __main__ -     Num examples = 10000
09/24/2023 08:55:39 - INFO - __main__ -     Batch size = 16
09/24/2023 08:59:34 - INFO - __main__ -   ***** Eval results *****
09/24/2023 08:59:34 - INFO - __main__ -     acc = 0.8542
09/24/2023 09:03:12 - INFO - __main__ -    global_step = 15050, average loss = 0.07868185437655484
09/24/2023 09:07:00 - INFO - __main__ -    global_step = 15100, average loss = 0.08520105790423259
09/24/2023 09:10:42 - INFO - __main__ -    global_step = 15150, average loss = 0.09536004922925713
09/24/2023 09:14:19 - INFO - __main__ -    global_step = 15200, average loss = 0.08502999547665241
09/24/2023 09:17:58 - INFO - __main__ -    global_step = 15250, average loss = 0.08957034896484402
09/24/2023 09:21:34 - INFO - __main__ -    global_step = 15300, average loss = 0.07968287494033575
09/24/2023 09:25:14 - INFO - __main__ -    global_step = 15350, average loss = 0.08545487473544199
09/24/2023 09:28:55 - INFO - __main__ -    global_step = 15400, average loss = 0.08528959889241378
09/24/2023 09:32:38 - INFO - __main__ -    global_step = 15450, average loss = 0.08095955706679887
09/24/2023 09:36:19 - INFO - __main__ -    global_step = 15500, average loss = 0.08725373520917856
09/24/2023 09:36:19 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 09:36:19 - INFO - __main__ -     Num examples = 10000
09/24/2023 09:36:19 - INFO - __main__ -     Batch size = 16
09/24/2023 09:40:15 - INFO - __main__ -   ***** Eval results *****
09/24/2023 09:40:15 - INFO - __main__ -     acc = 0.8545
09/24/2023 09:40:42 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 09:44:22 - INFO - __main__ -    global_step = 15550, average loss = 0.0843266883040269
09/24/2023 09:48:03 - INFO - __main__ -    global_step = 15600, average loss = 0.07855528741223679
09/24/2023 09:51:47 - INFO - __main__ -    global_step = 15650, average loss = 0.09478737017554523
09/24/2023 09:55:32 - INFO - __main__ -    global_step = 15700, average loss = 0.08910313490487169
09/24/2023 09:59:16 - INFO - __main__ -    global_step = 15750, average loss = 0.07736712342710234
09/24/2023 10:02:53 - INFO - __main__ -    global_step = 15800, average loss = 0.08501649839432503
09/24/2023 10:06:37 - INFO - __main__ -    global_step = 15850, average loss = 0.08495221398276044
09/24/2023 10:10:23 - INFO - __main__ -    global_step = 15900, average loss = 0.08510145512744202
09/24/2023 10:14:07 - INFO - __main__ -    global_step = 15950, average loss = 0.08335533107921947
09/24/2023 10:17:49 - INFO - __main__ -    global_step = 16000, average loss = 0.09103241352764599
09/24/2023 10:17:49 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 10:17:49 - INFO - __main__ -     Num examples = 10000
09/24/2023 10:17:49 - INFO - __main__ -     Batch size = 16
09/24/2023 10:21:45 - INFO - __main__ -   ***** Eval results *****
09/24/2023 10:21:45 - INFO - __main__ -     acc = 0.8549
09/24/2023 10:22:12 - INFO - __main__ -   Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 10:25:53 - INFO - __main__ -    global_step = 16050, average loss = 0.0808029190406296
09/24/2023 10:29:33 - INFO - __main__ -    global_step = 16100, average loss = 0.0950222506766113
09/24/2023 10:33:15 - INFO - __main__ -    global_step = 16150, average loss = 0.08560644885961664
09/24/2023 10:36:53 - INFO - __main__ -    global_step = 16200, average loss = 0.07925290400889935
09/24/2023 10:40:34 - INFO - __main__ -    global_step = 16250, average loss = 0.08252620983123052
09/24/2023 10:44:15 - INFO - __main__ -    global_step = 16300, average loss = 0.08747977073326182
09/24/2023 10:47:55 - INFO - __main__ -    global_step = 16350, average loss = 0.08805208059333382
09/24/2023 10:51:41 - INFO - __main__ -    global_step = 16400, average loss = 0.07935831163018064
09/24/2023 10:55:23 - INFO - __main__ -    global_step = 16450, average loss = 0.0807358610859228
09/24/2023 10:59:03 - INFO - __main__ -    global_step = 16500, average loss = 0.0775301494665473
09/24/2023 10:59:03 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 10:59:03 - INFO - __main__ -     Num examples = 10000
09/24/2023 10:59:03 - INFO - __main__ -     Batch size = 16
09/24/2023 11:02:59 - INFO - __main__ -   ***** Eval results *****
09/24/2023 11:02:59 - INFO - __main__ -     acc = 0.8532
09/24/2023 11:06:39 - INFO - __main__ -    global_step = 16550, average loss = 0.06899339191091712
09/24/2023 11:10:25 - INFO - __main__ -    global_step = 16600, average loss = 0.08612027997849508
09/24/2023 11:14:10 - INFO - __main__ -    global_step = 16650, average loss = 0.08232147437905951
09/24/2023 11:17:50 - INFO - __main__ -    global_step = 16700, average loss = 0.08530993062430753
09/24/2023 11:18:50 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 11:18:50 - INFO - __main__ -     Num examples = 10000
09/24/2023 11:18:50 - INFO - __main__ -     Batch size = 16
09/24/2023 11:22:45 - INFO - __main__ -   ***** Eval results *****
09/24/2023 11:22:45 - INFO - __main__ -     acc = 0.8533
09/24/2023 11:22:45 - INFO - __main__ -    global_step = 16713, average loss = 0.11041826268834619
09/24/2023 11:23:18 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 11:23:18 - INFO - __main__ -     Num examples = 10000
09/24/2023 11:23:18 - INFO - __main__ -     Batch size = 16
09/24/2023 11:27:13 - INFO - __main__ -   ***** Eval results *****
09/24/2023 11:27:13 - INFO - __main__ -     acc = 0.8549
09/24/2023 11:27:16 - INFO - evaluate_DeBERTa -   Namespace(dataset_file='../../../data/mcqa/eval/socialiqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='socialiqa', overwrite_output_dir=False, cache_dir=None)
09/24/2023 11:27:16 - INFO - evaluate_DeBERTa -   Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 11:34:38 - INFO - evaluate_DeBERTa -   Namespace(dataset_file='../../../data/mcqa/eval/winogrande_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='winogrande', overwrite_output_dir=False, cache_dir=None)
09/24/2023 11:34:38 - INFO - evaluate_DeBERTa -   Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 11:37:05 - INFO - evaluate_DeBERTa -   Namespace(dataset_file='../../../data/mcqa/eval/piqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='piqa', overwrite_output_dir=False, cache_dir=None)
09/24/2023 11:37:05 - INFO - evaluate_DeBERTa -   Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 11:43:59 - INFO - evaluate_DeBERTa -   Namespace(dataset_file='../../../data/mcqa/eval/commonsenseqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='commonsenseqa', overwrite_output_dir=False, cache_dir=None)
09/24/2023 11:43:59 - INFO - evaluate_DeBERTa -   Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 11:49:43 - INFO - evaluate_DeBERTa -   Namespace(dataset_file='../../../data/mcqa/eval/anli_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='anli', overwrite_output_dir=False, cache_dir=None)
09/24/2023 11:49:43 - INFO - evaluate_DeBERTa -   Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6
09/24/2023 11:54:31 - INFO - __main__ -   ***** Running evaluation *****
09/24/2023 11:54:31 - INFO - __main__ -     Num examples = 120
09/24/2023 11:54:31 - INFO - __main__ -     Batch size = 16
09/24/2023 11:54:47 - INFO - __main__ -   ***** Eval results *****
09/24/2023 11:54:47 - INFO - __main__ -     acc = 0.525