File size: 16,575 Bytes
7aeccfe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-10-17 21:30:14,019: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:30:20,481: llmtf.base.darumeru/MultiQ: Loading Dataset: 6.46s INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Processing Dataset: 339.11s INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-17 21:35:59,594: llmtf.base.darumeru/MultiQ: {'f1': 0.3346248767848689, 'em': 0.22275334608030592} INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: mean darumeru/MultiQ 0.279 0.279 INFO: 2024-10-17 21:36:08,809: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:36:12,969: llmtf.base.darumeru/PARus: Loading Dataset: 4.16s INFO: 2024-10-17 21:36:18,316: llmtf.base.darumeru/PARus: Processing Dataset: 5.35s INFO: 2024-10-17 21:36:18,317: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-17 21:36:18,327: llmtf.base.darumeru/PARus: {'acc': 0.7} INFO: 2024-10-17 21:36:18,327: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:36:18,328: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus 0.489 0.279 0.700 INFO: 2024-10-17 21:36:27,550: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-10-17 21:36:27,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:36:27,551: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:36:31,450: llmtf.base.darumeru/RCB: Loading Dataset: 3.90s INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Processing Dataset: 7.23s INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-17 21:36:38,686: llmtf.base.darumeru/RCB: {'acc': 0.5454545454545454, 'f1_macro': 0.49090309951702227} INFO: 2024-10-17 21:36:38,687: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:36:38,688: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB 0.499 0.279 0.700 0.518 INFO: 2024-10-17 21:36:48,734: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:36:54,900: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 6.17s INFO: 2024-10-17 21:38:00,519: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 65.62s INFO: 2024-10-17 21:38:00,520: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-17 21:38:00,532: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7302405498281787, 'f1_macro': 0.7304546157096631} INFO: 2024-10-17 21:38:00,541: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:38:00,542: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA 0.557 0.279 0.700 0.518 0.730 INFO: 2024-10-17 21:38:09,745: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:38:14,102: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 4.36s INFO: 2024-10-17 21:38:16,932: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.83s INFO: 2024-10-17 21:38:16,933: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-17 21:38:16,936: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9043404138496471} INFO: 2024-10-17 21:38:16,936: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:38:16,937: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree 0.626 0.279 0.700 0.518 0.730 0.905 INFO: 2024-10-17 21:38:26,077: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:38:30,781: llmtf.base.darumeru/RWSD: Loading Dataset: 4.70s INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Processing Dataset: 5.72s INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-17 21:38:36,498: llmtf.base.darumeru/RWSD: {'acc': 0.6029411764705882} INFO: 2024-10-17 21:38:36,499: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:38:36,500: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.622 0.279 0.700 0.518 0.603 0.730 0.905 INFO: 2024-10-17 21:38:45,688: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:39:02,002: llmtf.base.daru/treewayextractive: Loading Dataset: 16.31s INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Processing Dataset: 183.77s INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-17 21:42:06,010: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917218614718615} INFO: 2024-10-17 21:42:06,052: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:42:06,054: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.589 0.392 0.279 0.700 0.518 0.603 0.730 0.905 INFO: 2024-10-17 21:42:15,170: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:46:47,282: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 272.11s INFO: 2024-10-17 21:56:29,398: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 582.12s INFO: 2024-10-17 21:56:29,399: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-17 21:56:29,464: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.340000 anatomy 0.414815 astronomy 0.611842 business_ethics 0.610000 clinical_knowledge 0.554717 college_biology 0.548611 college_chemistry 0.380000 college_computer_science 0.450000 college_mathematics 0.400000 college_medicine 0.526012 college_physics 0.470588 computer_security 0.620000 conceptual_physics 0.565957 econometrics 0.377193 electrical_engineering 0.537931 elementary_mathematics 0.529101 formal_logic 0.365079 global_facts 0.360000 high_school_biology 0.664516 high_school_chemistry 0.487685 high_school_computer_science 0.700000 high_school_european_history 0.751515 high_school_geography 0.722222 high_school_government_and_politics 0.564767 high_school_macroeconomics 0.528205 high_school_mathematics 0.433333 high_school_microeconomics 0.533613 high_school_physics 0.403974 high_school_psychology 0.713761 high_school_statistics 0.523148 high_school_us_history 0.661765 high_school_world_history 0.717300 human_aging 0.587444 human_sexuality 0.618321 international_law 0.735537 jurisprudence 0.666667 logical_fallacies 0.564417 machine_learning 0.392857 management 0.650485 marketing 0.752137 medical_genetics 0.580000 miscellaneous 0.632184 moral_disputes 0.583815 moral_scenarios 0.299441 nutrition 0.637255 philosophy 0.617363 prehistory 0.561728 professional_accounting 0.386525 professional_law 0.377445 professional_medicine 0.481618 professional_psychology 0.516340 public_relations 0.500000 security_studies 0.648980 sociology 0.756219 us_foreign_policy 0.720000 virology 0.439759 world_religions 0.719298 INFO: 2024-10-17 21:56:29,473: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.503308 humanities 0.586259 other (business, health, misc.) 0.543782 social sciences 0.599968 INFO: 2024-10-17 21:56:29,478: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5583294528508019} INFO: 2024-10-17 21:56:29,516: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 21:56:29,518: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.586 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.558 INFO: 2024-10-17 21:56:39,535: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 21:58:54,966: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 135.43s INFO: 2024-10-17 22:08:04,419: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 549.45s INFO: 2024-10-17 22:08:04,426: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-17 22:08:04,492: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.380000 anatomy 0.637037 astronomy 0.717105 business_ethics 0.700000 clinical_knowledge 0.705660 college_biology 0.715278 college_chemistry 0.470000 college_computer_science 0.580000 college_mathematics 0.330000 college_medicine 0.664740 college_physics 0.509804 computer_security 0.740000 conceptual_physics 0.642553 econometrics 0.508772 electrical_engineering 0.600000 elementary_mathematics 0.547619 formal_logic 0.412698 global_facts 0.360000 high_school_biology 0.783871 high_school_chemistry 0.581281 high_school_computer_science 0.710000 high_school_european_history 0.800000 high_school_geography 0.757576 high_school_government_and_politics 0.854922 high_school_macroeconomics 0.679487 high_school_mathematics 0.455556 high_school_microeconomics 0.773109 high_school_physics 0.437086 high_school_psychology 0.844037 high_school_statistics 0.652778 high_school_us_history 0.833333 high_school_world_history 0.843882 human_aging 0.677130 human_sexuality 0.786260 international_law 0.768595 jurisprudence 0.814815 logical_fallacies 0.803681 machine_learning 0.446429 management 0.786408 marketing 0.858974 medical_genetics 0.760000 miscellaneous 0.795658 moral_disputes 0.667630 moral_scenarios 0.311732 nutrition 0.732026 philosophy 0.704180 prehistory 0.712963 professional_accounting 0.503546 professional_law 0.457627 professional_medicine 0.658088 professional_psychology 0.668301 public_relations 0.709091 security_studies 0.697959 sociology 0.800995 us_foreign_policy 0.800000 virology 0.506024 world_religions 0.801170 INFO: 2024-10-17 22:08:04,506: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.572187 humanities 0.687100 other (business, health, misc.) 0.667521 social sciences 0.740042 INFO: 2024-10-17 22:08:04,511: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6667125709237595} INFO: 2024-10-17 22:08:04,554: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 22:08:04,556: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.595 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558 INFO: 2024-10-17 22:08:14,512: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 22:08:18,791: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.28s INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Processing Dataset: 207.47s INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-17 22:11:46,261: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33109987599556284, 'rouge2': 0.11202889150257295} INFO: 2024-10-17 22:11:46,262: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 22:11:46,263: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.557 0.222 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558 INFO: 2024-10-17 22:11:55,717: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-17 22:11:59,846: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.13s INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 150.13s INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-17 22:14:29,976: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.993754090002875, 'len': 0.9986883734384026, 'lcs': 0.98} INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: Ended eval INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.596 0.222 0.392 0.279 0.700 0.518 0.603 0.980 0.730 0.905 0.667 0.558 |