File size: 16,570 Bytes
68fe11e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-10-17 07:12:53,947: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-10-17 07:12:53,947: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:12:53,947: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:13:01,539: llmtf.base.darumeru/MultiQ: Loading Dataset: 7.59s INFO: 2024-10-17 07:18:20,829: llmtf.base.darumeru/MultiQ: Processing Dataset: 319.29s INFO: 2024-10-17 07:18:20,829: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-17 07:18:20,830: llmtf.base.darumeru/MultiQ: {'f1': 0.3485719410941241, 'em': 0.24282982791587} INFO: 2024-10-17 07:18:20,835: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:18:20,835: llmtf.base.evaluator: mean darumeru/MultiQ 0.296 0.296 INFO: 2024-10-17 07:18:30,261: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-10-17 07:18:30,261: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:18:30,261: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:18:34,809: llmtf.base.darumeru/PARus: Loading Dataset: 4.55s INFO: 2024-10-17 07:18:39,184: llmtf.base.darumeru/PARus: Processing Dataset: 4.37s INFO: 2024-10-17 07:18:39,184: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-17 07:18:39,194: llmtf.base.darumeru/PARus: {'acc': 0.68} INFO: 2024-10-17 07:18:39,194: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:18:39,195: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus 0.488 0.296 0.680 INFO: 2024-10-17 07:18:48,257: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-10-17 07:18:48,258: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:18:48,258: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:18:52,169: llmtf.base.darumeru/RCB: Loading Dataset: 3.91s INFO: 2024-10-17 07:18:57,742: llmtf.base.darumeru/RCB: Processing Dataset: 5.57s INFO: 2024-10-17 07:18:57,742: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-17 07:18:57,745: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.47584611730940257} INFO: 2024-10-17 07:18:57,746: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:18:57,747: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB 0.492 0.296 0.680 0.502 INFO: 2024-10-17 07:19:07,388: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-10-17 07:19:07,388: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:19:07,388: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:19:13,124: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 5.74s INFO: 2024-10-17 07:20:12,666: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 59.54s INFO: 2024-10-17 07:20:12,666: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-17 07:20:12,678: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7207903780068728, 'f1_macro': 0.7206838429510474} INFO: 2024-10-17 07:20:12,689: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:20:12,690: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA 0.549 0.296 0.680 0.502 0.721 INFO: 2024-10-17 07:20:21,945: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-10-17 07:20:21,945: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:20:21,945: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:20:25,640: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.69s INFO: 2024-10-17 07:20:28,309: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.67s INFO: 2024-10-17 07:20:28,310: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-17 07:20:28,312: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8952380952380953, 'f1_macro': 0.8944916936662219} INFO: 2024-10-17 07:20:28,313: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:20:28,314: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree 0.619 0.296 0.680 0.502 0.721 0.895 INFO: 2024-10-17 07:20:37,966: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-10-17 07:20:37,967: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:20:37,967: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:20:42,582: llmtf.base.darumeru/RWSD: Loading Dataset: 4.62s INFO: 2024-10-17 07:20:47,988: llmtf.base.darumeru/RWSD: Processing Dataset: 5.41s INFO: 2024-10-17 07:20:47,988: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-17 07:20:47,989: llmtf.base.darumeru/RWSD: {'acc': 0.5343137254901961} INFO: 2024-10-17 07:20:47,989: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:20:47,990: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.605 0.296 0.680 0.502 0.534 0.721 0.895 INFO: 2024-10-17 07:20:57,317: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-17 07:20:57,317: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:20:57,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:21:13,664: llmtf.base.daru/treewayextractive: Loading Dataset: 16.35s INFO: 2024-10-17 07:24:01,803: llmtf.base.daru/treewayextractive: Processing Dataset: 168.14s INFO: 2024-10-17 07:24:01,803: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-17 07:24:02,038: llmtf.base.daru/treewayextractive: {'r-prec': 0.3983020202020202} INFO: 2024-10-17 07:24:02,084: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:24:02,085: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.575 0.398 0.296 0.680 0.502 0.534 0.721 0.895 INFO: 2024-10-17 07:24:11,344: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-17 07:24:11,345: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:24:11,345: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:29:12,497: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 301.15s INFO: 2024-10-17 07:35:18,210: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 365.71s INFO: 2024-10-17 07:35:18,210: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-17 07:35:18,279: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.330000 anatomy 0.422222 astronomy 0.625000 business_ethics 0.580000 clinical_knowledge 0.592453 college_biology 0.506944 college_chemistry 0.340000 college_computer_science 0.540000 college_mathematics 0.370000 college_medicine 0.549133 college_physics 0.431373 computer_security 0.570000 conceptual_physics 0.536170 econometrics 0.385965 electrical_engineering 0.531034 elementary_mathematics 0.515873 formal_logic 0.333333 global_facts 0.390000 high_school_biology 0.670968 high_school_chemistry 0.487685 high_school_computer_science 0.660000 high_school_european_history 0.733333 high_school_geography 0.696970 high_school_government_and_politics 0.569948 high_school_macroeconomics 0.523077 high_school_mathematics 0.429630 high_school_microeconomics 0.521008 high_school_physics 0.443709 high_school_psychology 0.706422 high_school_statistics 0.523148 high_school_us_history 0.642157 high_school_world_history 0.729958 human_aging 0.587444 human_sexuality 0.641221 international_law 0.694215 jurisprudence 0.638889 logical_fallacies 0.533742 machine_learning 0.419643 management 0.650485 marketing 0.726496 medical_genetics 0.550000 miscellaneous 0.629630 moral_disputes 0.575145 moral_scenarios 0.248045 nutrition 0.614379 philosophy 0.643087 prehistory 0.546296 professional_accounting 0.358156 professional_law 0.373533 professional_medicine 0.500000 professional_psychology 0.495098 public_relations 0.500000 security_studies 0.665306 sociology 0.701493 us_foreign_policy 0.700000 virology 0.433735 world_religions 0.672515 INFO: 2024-10-17 07:35:18,289: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.496176 humanities 0.566481 other (business, health, misc.) 0.541724 social sciences 0.592209 INFO: 2024-10-17 07:35:18,294: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.549147460511024} INFO: 2024-10-17 07:35:18,341: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:35:18,343: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.572 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.549 INFO: 2024-10-17 07:35:27,953: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-17 07:35:27,953: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:35:27,953: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:37:30,758: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 122.80s INFO: 2024-10-17 07:43:10,625: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 339.87s INFO: 2024-10-17 07:43:10,626: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-17 07:43:10,691: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.370000 anatomy 0.622222 astronomy 0.697368 business_ethics 0.670000 clinical_knowledge 0.709434 college_biology 0.701389 college_chemistry 0.450000 college_computer_science 0.570000 college_mathematics 0.360000 college_medicine 0.670520 college_physics 0.480392 computer_security 0.720000 conceptual_physics 0.655319 econometrics 0.500000 electrical_engineering 0.565517 elementary_mathematics 0.539683 formal_logic 0.357143 global_facts 0.370000 high_school_biology 0.800000 high_school_chemistry 0.561576 high_school_computer_science 0.670000 high_school_european_history 0.763636 high_school_geography 0.772727 high_school_government_and_politics 0.849741 high_school_macroeconomics 0.679487 high_school_mathematics 0.440741 high_school_microeconomics 0.756303 high_school_physics 0.450331 high_school_psychology 0.849541 high_school_statistics 0.643519 high_school_us_history 0.813725 high_school_world_history 0.835443 human_aging 0.695067 human_sexuality 0.763359 international_law 0.768595 jurisprudence 0.787037 logical_fallacies 0.779141 machine_learning 0.464286 management 0.805825 marketing 0.884615 medical_genetics 0.750000 miscellaneous 0.784163 moral_disputes 0.650289 moral_scenarios 0.270391 nutrition 0.718954 philosophy 0.717042 prehistory 0.737654 professional_accounting 0.496454 professional_law 0.458931 professional_medicine 0.672794 professional_psychology 0.668301 public_relations 0.681818 security_studies 0.718367 sociology 0.810945 us_foreign_policy 0.790000 virology 0.487952 world_religions 0.812865 INFO: 2024-10-17 07:43:10,700: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.563340 humanities 0.673223 other (business, health, misc.) 0.667000 social sciences 0.736716 INFO: 2024-10-17 07:43:10,705: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6600696360837558} INFO: 2024-10-17 07:43:10,741: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:43:10,743: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.582 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.660 0.549 INFO: 2024-10-17 07:43:20,115: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-17 07:43:20,115: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:43:20,115: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:43:24,372: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.26s INFO: 2024-10-17 07:47:01,407: llmtf.base.daru/treewayabstractive: Processing Dataset: 217.03s INFO: 2024-10-17 07:47:01,407: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-17 07:47:01,408: llmtf.base.daru/treewayabstractive: {'rouge1': 0.32720307606797727, 'rouge2': 0.10857945570692258} INFO: 2024-10-17 07:47:01,409: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:47:01,410: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.545 0.218 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.660 0.549 INFO: 2024-10-17 07:47:10,811: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-10-17 07:47:10,811: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-17 07:47:10,811: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 07:47:15,676: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.86s INFO: 2024-10-17 07:49:51,029: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 155.35s INFO: 2024-10-17 07:49:51,030: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-17 07:49:51,031: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.76859951568896, 'len': 0.9950709951674359, 'lcs': 0.9} INFO: 2024-10-17 07:49:51,031: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 07:49:51,032: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.578 0.218 0.398 0.296 0.680 0.502 0.534 0.900 0.721 0.895 0.660 0.549 |