File size: 16,582 Bytes
0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 c9cc895 0a8c241 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-10-28 13:15:15,094: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-10-28 13:15:15,094: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:15:15,094: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:15:16,695: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-10-28 13:15:16,695: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:15:16,695: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:15:18,943: llmtf.base.darumeru/PARus: Loading Dataset: 2.25s INFO: 2024-10-28 13:15:19,297: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.20s INFO: 2024-10-28 13:15:22,318: llmtf.base.darumeru/PARus: Processing Dataset: 3.37s INFO: 2024-10-28 13:15:22,318: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-28 13:15:22,329: llmtf.base.darumeru/PARus: {'acc': 0.78} INFO: 2024-10-28 13:15:22,330: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:15:22,330: llmtf.base.evaluator: mean darumeru/PARus 0.780 0.780 INFO: 2024-10-28 13:15:30,304: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-10-28 13:15:30,304: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:15:30,304: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:15:33,637: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.33s INFO: 2024-10-28 13:16:05,173: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 31.54s INFO: 2024-10-28 13:16:05,173: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-28 13:16:05,184: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8256013745704467, 'f1_macro': 0.8262484506706507} INFO: 2024-10-28 13:16:05,191: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:16:05,192: llmtf.base.evaluator: mean darumeru/PARus darumeru/ruOpenBookQA 0.803 0.780 0.826 INFO: 2024-10-28 13:16:13,923: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-10-28 13:16:13,923: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:16:13,923: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:16:16,429: llmtf.base.darumeru/RWSD: Loading Dataset: 2.51s INFO: 2024-10-28 13:16:22,246: llmtf.base.darumeru/RWSD: Processing Dataset: 5.82s INFO: 2024-10-28 13:16:22,246: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-28 13:16:22,247: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235} INFO: 2024-10-28 13:16:22,248: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:16:22,249: llmtf.base.evaluator: mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA 0.717 0.780 0.544 0.826 INFO: 2024-10-28 13:16:31,348: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-28 13:16:31,348: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:16:31,348: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:18:38,554: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 127.21s INFO: 2024-10-28 13:20:06,478: llmtf.base.darumeru/MultiQ: Processing Dataset: 287.18s INFO: 2024-10-28 13:20:06,479: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-28 13:20:06,480: llmtf.base.darumeru/MultiQ: {'f1': 0.2503859074384594, 'em': 0.14531548757170173} INFO: 2024-10-28 13:20:06,488: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:20:06,489: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA 0.587 0.198 0.780 0.544 0.826 INFO: 2024-10-28 13:20:15,334: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-10-28 13:20:15,335: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:20:15,335: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:20:18,179: llmtf.base.darumeru/RCB: Loading Dataset: 2.84s INFO: 2024-10-28 13:20:23,505: llmtf.base.darumeru/RCB: Processing Dataset: 5.33s INFO: 2024-10-28 13:20:23,506: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-28 13:20:23,510: llmtf.base.darumeru/RCB: {'acc': 0.5863636363636363, 'f1_macro': 0.5032640286161413} INFO: 2024-10-28 13:20:23,511: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:20:23,512: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA 0.579 0.198 0.780 0.545 0.544 0.826 INFO: 2024-10-28 13:20:32,046: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-10-28 13:20:32,046: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:20:32,046: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:20:34,403: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.36s INFO: 2024-10-28 13:20:36,969: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.57s INFO: 2024-10-28 13:20:36,969: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-28 13:20:36,972: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9038817229146561} INFO: 2024-10-28 13:20:36,972: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:20:36,972: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.633 0.198 0.780 0.545 0.544 0.826 0.904 INFO: 2024-10-28 13:20:45,488: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-28 13:20:45,488: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:20:45,488: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:20:59,443: llmtf.base.daru/treewayextractive: Loading Dataset: 13.95s INFO: 2024-10-28 13:23:49,533: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 310.98s INFO: 2024-10-28 13:23:49,533: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-28 13:23:49,597: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.480000 anatomy 0.562963 astronomy 0.769737 business_ethics 0.640000 clinical_knowledge 0.671698 college_biology 0.673611 college_chemistry 0.470000 college_computer_science 0.670000 college_mathematics 0.440000 college_medicine 0.589595 college_physics 0.480392 computer_security 0.720000 conceptual_physics 0.655319 econometrics 0.482456 electrical_engineering 0.606897 elementary_mathematics 0.616402 formal_logic 0.428571 global_facts 0.370000 high_school_biology 0.809677 high_school_chemistry 0.571429 high_school_computer_science 0.770000 high_school_european_history 0.751515 high_school_geography 0.782828 high_school_government_and_politics 0.725389 high_school_macroeconomics 0.658974 high_school_mathematics 0.525926 high_school_microeconomics 0.705882 high_school_physics 0.463576 high_school_psychology 0.796330 high_school_statistics 0.606481 high_school_us_history 0.779412 high_school_world_history 0.801688 human_aging 0.632287 human_sexuality 0.717557 international_law 0.743802 jurisprudence 0.675926 logical_fallacies 0.662577 machine_learning 0.482143 management 0.747573 marketing 0.816239 medical_genetics 0.650000 miscellaneous 0.711367 moral_disputes 0.627168 moral_scenarios 0.244693 nutrition 0.689542 philosophy 0.646302 prehistory 0.660494 professional_accounting 0.439716 professional_law 0.411343 professional_medicine 0.613971 professional_psychology 0.591503 public_relations 0.545455 security_studies 0.665306 sociology 0.736318 us_foreign_policy 0.800000 virology 0.500000 world_religions 0.760234 INFO: 2024-10-28 13:23:49,606: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.600644 humanities 0.630286 other (business, health, misc.) 0.616782 social sciences 0.684000 INFO: 2024-10-28 13:23:49,611: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6329281396317665} INFO: 2024-10-28 13:23:49,646: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:23:49,648: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.633 0.198 0.780 0.545 0.544 0.826 0.904 0.633 INFO: 2024-10-28 13:23:57,887: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-28 13:23:57,887: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:23:57,887: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:24:02,221: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.33s INFO: 2024-10-28 13:26:15,188: llmtf.base.daru/treewayextractive: Processing Dataset: 315.74s INFO: 2024-10-28 13:26:15,188: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-28 13:26:15,447: llmtf.base.daru/treewayextractive: {'r-prec': 0.40380281385281386} INFO: 2024-10-28 13:26:15,501: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:26:15,503: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.604 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.633 INFO: 2024-10-28 13:26:24,206: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-28 13:26:24,207: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:26:24,207: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:26:48,154: llmtf.base.daru/treewayabstractive: Processing Dataset: 165.93s INFO: 2024-10-28 13:26:48,154: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-28 13:26:48,155: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3489002151166006, 'rouge2': 0.12404569962254197} INFO: 2024-10-28 13:26:48,156: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:26:48,157: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.563 0.236 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.633 INFO: 2024-10-28 13:28:23,832: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 119.62s INFO: 2024-10-28 13:33:05,781: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 281.95s INFO: 2024-10-28 13:33:05,781: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-28 13:33:05,844: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.450000 anatomy 0.725926 astronomy 0.861842 business_ethics 0.750000 clinical_knowledge 0.762264 college_biology 0.854167 college_chemistry 0.510000 college_computer_science 0.720000 college_mathematics 0.470000 college_medicine 0.699422 college_physics 0.509804 computer_security 0.770000 conceptual_physics 0.706383 econometrics 0.605263 electrical_engineering 0.696552 elementary_mathematics 0.666667 formal_logic 0.492063 global_facts 0.420000 high_school_biology 0.861290 high_school_chemistry 0.620690 high_school_computer_science 0.840000 high_school_european_history 0.824242 high_school_geography 0.873737 high_school_government_and_politics 0.927461 high_school_macroeconomics 0.761538 high_school_mathematics 0.566667 high_school_microeconomics 0.873950 high_school_physics 0.582781 high_school_psychology 0.888073 high_school_statistics 0.708333 high_school_us_history 0.838235 high_school_world_history 0.860759 human_aging 0.762332 human_sexuality 0.786260 international_law 0.809917 jurisprudence 0.796296 logical_fallacies 0.828221 machine_learning 0.526786 management 0.854369 marketing 0.914530 medical_genetics 0.810000 miscellaneous 0.848020 moral_disputes 0.736994 moral_scenarios 0.459218 nutrition 0.797386 philosophy 0.723473 prehistory 0.805556 professional_accounting 0.556738 professional_law 0.507823 professional_medicine 0.742647 professional_psychology 0.750000 public_relations 0.636364 security_studies 0.759184 sociology 0.845771 us_foreign_policy 0.850000 virology 0.506024 world_religions 0.853801 INFO: 2024-10-28 13:33:05,852: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.662331 humanities 0.733585 other (business, health, misc.) 0.724976 social sciences 0.796467 INFO: 2024-10-28 13:33:05,857: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7293395108036221} INFO: 2024-10-28 13:33:05,908: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:33:05,910: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.580 0.236 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.729 0.633 INFO: 2024-10-28 13:33:14,562: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-10-28 13:33:14,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-28 13:33:14,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-28 13:33:17,057: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.49s INFO: 2024-10-28 13:35:21,669: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 124.61s INFO: 2024-10-28 13:35:21,670: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-28 13:35:21,670: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9953318595732386, 'len': 0.9990656928305265, 'lcs': 1.0} INFO: 2024-10-28 13:35:21,671: llmtf.base.evaluator: Ended eval INFO: 2024-10-28 13:35:21,672: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.618 0.236 0.404 0.198 0.780 0.545 0.544 1.000 0.826 0.904 0.729 0.633 |