File size: 20,055 Bytes
d2dce0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
INFO: 2024-08-28 09:38:05,098: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom']
INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:07,960: llmtf.base.darumeru/PARus: Loading Dataset: 2.86s
INFO: 2024-08-28 09:38:07,992: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive']
INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:09,381: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu']
INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Processing Dataset: 3.52s
INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-08-28 09:38:11,487: llmtf.base.darumeru/PARus: {'acc': 0.61}
INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:13,100: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:13,501: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s
INFO: 2024-08-28 09:38:13,809: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use']
INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:15,664: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru']
INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:16,575: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.58s
INFO: 2024-08-28 09:38:17,521: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s
INFO: 2024-08-28 09:38:17,560: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.46s
INFO: 2024-08-28 09:38:17,771: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en']
INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:18,163: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.50s
INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Processing Dataset: 5.47s
INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-08-28 09:38:18,971: llmtf.base.darumeru/RCB: {'acc': 0.4590909090909091, 'f1_macro': 0.41511023060616065}
INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:38:20,244: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.47s
INFO: 2024-08-28 09:38:21,566: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.59s
INFO: 2024-08-28 09:38:59,395: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 37.83s
INFO: 2024-08-28 09:38:59,396: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-08-28 09:38:59,405: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7246563573883161, 'f1_macro': 0.7254261079279148}
INFO: 2024-08-28 09:38:59,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:38:59,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:39:01,164: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 1.75s
INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.01s
INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-08-28 09:39:03,179: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8640387481371088}
INFO: 2024-08-28 09:39:03,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:39:03,180: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:39:05,009: llmtf.base.darumeru/RWSD: Loading Dataset: 1.83s
INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Processing Dataset: 5.96s
INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-08-28 09:39:10,970: llmtf.base.darumeru/RWSD: {'acc': 0.5686274509803921}
INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:39:14,992: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.02s
INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 43.71s
INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-08-28 09:39:58,716: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7100825260136348, 'mcc': 0.2607217783495962}
INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.615	0.610	0.437	0.569	0.725	0.865	0.485
INFO: 2024-08-28 09:41:10,442: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 170.20s
INFO: 2024-08-28 09:41:10,443: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-08-28 09:41:10,444: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.394786509645572, 'len': 0.9984205596649908, 'lcs': 0.9939024390243902}
INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:41:12,462: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.02s
INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 177.42s
INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-08-28 09:41:15,581: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.705603375484548, 'len': 0.9949612573353719, 'lcs': 0.9404517453798767}
INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:41:17,523: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.94s
INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Processing Dataset: 143.27s
INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-08-28 09:43:35,735: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.521261233892721, 'len': 0.9996383092887717, 'lcs': 1.0}
INFO: 2024-08-28 09:43:35,735: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:43:35,736: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.743	0.610	0.437	0.569	1.000	0.998	0.995	0.725	0.865	0.485
INFO: 2024-08-28 09:43:52,340: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 342.96s
INFO: 2024-08-28 09:43:56,137: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 158.61s
INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7865670306157933, 'len': 0.9983390643026695, 'lcs': 0.97}
INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.765	0.610	0.437	0.569	1.000	0.970	0.998	0.995	0.725	0.865	0.485
INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Processing Dataset: 355.94s
INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-08-28 09:44:12,524: llmtf.base.darumeru/ruMMLU: {'acc': 0.5018457547640427}
INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:44:25,707: llmtf.base.daru/treewayextractive: Loading Dataset: 13.14s
INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Processing Dataset: 376.40s
INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-08-28 09:44:33,922: llmtf.base.darumeru/MultiQ: {'f1': 0.2660316706577536, 'em': 0.15487571701720843}
INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:44:37,006: llmtf.base.darumeru/USE: Loading Dataset: 3.08s
INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Processing Dataset: 452.34s
INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-08-28 09:45:49,906: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3550759319426331, 'rouge2': 0.12663323877762525}
INFO: 2024-08-28 09:45:49,909: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:45:49,910: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.662	0.241	0.210	0.610	0.437	0.569	1.000	0.970	0.998	0.995	0.502	0.725	0.865	0.485
INFO: 2024-08-28 09:48:00,114: llmtf.base.darumeru/USE: Processing Dataset: 203.11s
INFO: 2024-08-28 09:48:00,115: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-08-28 09:48:00,116: llmtf.base.darumeru/USE: {'grade_norm': 0.0931372549019608}
INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.622	0.241	0.210	0.610	0.437	0.569	0.093	1.000	0.970	0.998	0.995	0.502	0.725	0.865	0.485
INFO: 2024-08-28 09:49:59,058: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.72s
INFO: 2024-08-28 09:49:59,059: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-08-28 09:49:59,121: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.270000
anatomy                              0.400000
astronomy                            0.677632
business_ethics                      0.560000
clinical_knowledge                   0.584906
college_biology                      0.506944
college_chemistry                    0.420000
college_computer_science             0.430000
college_mathematics                  0.310000
college_medicine                     0.549133
college_physics                      0.352941
computer_security                    0.530000
conceptual_physics                   0.493617
econometrics                         0.377193
electrical_engineering               0.503448
elementary_mathematics               0.362434
formal_logic                         0.404762
global_facts                         0.310000
high_school_biology                  0.674194
high_school_chemistry                0.413793
high_school_computer_science         0.620000
high_school_european_history         0.709091
high_school_geography                0.686869
high_school_government_and_politics  0.616580
high_school_macroeconomics           0.507692
high_school_mathematics              0.355556
high_school_microeconomics           0.516807
high_school_physics                  0.377483
high_school_psychology               0.684404
high_school_statistics               0.467593
high_school_us_history               0.686275
high_school_world_history            0.725738
human_aging                          0.529148
human_sexuality                      0.610687
international_law                    0.652893
jurisprudence                        0.601852
logical_fallacies                    0.509202
machine_learning                     0.330357
management                           0.669903
marketing                            0.722222
medical_genetics                     0.550000
miscellaneous                        0.627075
moral_disputes                       0.572254
moral_scenarios                      0.218994
nutrition                            0.601307
philosophy                           0.598071
prehistory                           0.533951
professional_accounting              0.382979
professional_law                     0.355280
professional_medicine                0.533088
professional_psychology              0.486928
public_relations                     0.563636
security_studies                     0.616327
sociology                            0.726368
us_foreign_policy                    0.730000
virology                             0.463855
world_religions                      0.678363
INFO: 2024-08-28 09:49:59,130: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.449777
humanities                       0.557440
other (business, health, misc.)  0.534544
social sciences                  0.593624
INFO: 2024-08-28 09:49:59,135: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.53384650825764}
INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570]
INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>']
INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Processing Dataset: 377.93s
INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-08-28 09:50:43,882: llmtf.base.daru/treewayextractive: {'r-prec': 0.4306020202020202}
INFO: 2024-08-28 09:50:43,926: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:50:43,927: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.604	0.241	0.431	0.210	0.610	0.437	0.569	0.093	1.000	0.970	0.998	0.995	0.502	0.725	0.865	0.534	0.485
INFO: 2024-08-28 09:51:38,740: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 99.57s
INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 320.48s
INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-08-28 09:56:59,279: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.310000
anatomy                              0.622222
astronomy                            0.723684
business_ethics                      0.600000
clinical_knowledge                   0.720755
college_biology                      0.777778
college_chemistry                    0.460000
college_computer_science             0.550000
college_mathematics                  0.390000
college_medicine                     0.647399
college_physics                      0.421569
computer_security                    0.780000
conceptual_physics                   0.561702
econometrics                         0.412281
electrical_engineering               0.572414
elementary_mathematics               0.447090
formal_logic                         0.523810
global_facts                         0.440000
high_school_biology                  0.803226
high_school_chemistry                0.512315
high_school_computer_science         0.710000
high_school_european_history         0.727273
high_school_geography                0.772727
high_school_government_and_politics  0.849741
high_school_macroeconomics           0.661538
high_school_mathematics              0.374074
high_school_microeconomics           0.747899
high_school_physics                  0.417219
high_school_psychology               0.840367
high_school_statistics               0.611111
high_school_us_history               0.789216
high_school_world_history            0.818565
human_aging                          0.677130
human_sexuality                      0.763359
international_law                    0.743802
jurisprudence                        0.777778
logical_fallacies                    0.773006
machine_learning                     0.464286
management                           0.825243
marketing                            0.854701
medical_genetics                     0.760000
miscellaneous                        0.817369
moral_disputes                       0.664740
moral_scenarios                      0.448045
nutrition                            0.735294
philosophy                           0.678457
prehistory                           0.682099
professional_accounting              0.524823
professional_law                     0.453716
professional_medicine                0.764706
professional_psychology              0.643791
public_relations                     0.654545
security_studies                     0.685714
sociology                            0.830846
us_foreign_policy                    0.820000
virology                             0.500000
world_religions                      0.789474
INFO: 2024-08-28 09:56:59,286: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.549248
humanities                       0.682306
other (business, health, misc.)  0.677832
social sciences                  0.723567
INFO: 2024-08-28 09:56:59,291: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6582382725054904}
INFO: 2024-08-28 09:56:59,323: llmtf.base.evaluator: Ended eval
INFO: 2024-08-28 09:56:59,325: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.607	0.241	0.431	0.210	0.610	0.437	0.569	0.093	1.000	0.970	0.998	0.995	0.502	0.725	0.865	0.658	0.534	0.485