File size: 20,055 Bytes
d2dce0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
INFO: 2024-08-28 09:38:05,098: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom'] INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:07,960: llmtf.base.darumeru/PARus: Loading Dataset: 2.86s INFO: 2024-08-28 09:38:07,992: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive'] INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:09,381: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu'] INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Processing Dataset: 3.52s INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-08-28 09:38:11,487: llmtf.base.darumeru/PARus: {'acc': 0.61} INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:13,100: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:13,501: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s INFO: 2024-08-28 09:38:13,809: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use'] INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:15,664: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru'] INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:16,575: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.58s INFO: 2024-08-28 09:38:17,521: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s INFO: 2024-08-28 09:38:17,560: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.46s INFO: 2024-08-28 09:38:17,771: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en'] INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:18,163: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.50s INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Processing Dataset: 5.47s INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-08-28 09:38:18,971: llmtf.base.darumeru/RCB: {'acc': 0.4590909090909091, 'f1_macro': 0.41511023060616065} INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:20,244: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.47s INFO: 2024-08-28 09:38:21,566: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.59s INFO: 2024-08-28 09:38:59,395: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 37.83s INFO: 2024-08-28 09:38:59,396: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-08-28 09:38:59,405: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7246563573883161, 'f1_macro': 0.7254261079279148} INFO: 2024-08-28 09:38:59,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:59,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:01,164: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 1.75s INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.01s INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-08-28 09:39:03,179: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8640387481371088} INFO: 2024-08-28 09:39:03,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:39:03,180: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:05,009: llmtf.base.darumeru/RWSD: Loading Dataset: 1.83s INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Processing Dataset: 5.96s INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-08-28 09:39:10,970: llmtf.base.darumeru/RWSD: {'acc': 0.5686274509803921} INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:14,992: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.02s INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 43.71s INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-08-28 09:39:58,716: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7100825260136348, 'mcc': 0.2607217783495962} INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.615 0.610 0.437 0.569 0.725 0.865 0.485 INFO: 2024-08-28 09:41:10,442: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 170.20s INFO: 2024-08-28 09:41:10,443: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-08-28 09:41:10,444: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.394786509645572, 'len': 0.9984205596649908, 'lcs': 0.9939024390243902} INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:41:12,462: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.02s INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 177.42s INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-08-28 09:41:15,581: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.705603375484548, 'len': 0.9949612573353719, 'lcs': 0.9404517453798767} INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:41:17,523: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.94s INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Processing Dataset: 143.27s INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-08-28 09:43:35,735: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.521261233892721, 'len': 0.9996383092887717, 'lcs': 1.0} INFO: 2024-08-28 09:43:35,735: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:43:35,736: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.743 0.610 0.437 0.569 1.000 0.998 0.995 0.725 0.865 0.485 INFO: 2024-08-28 09:43:52,340: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 342.96s INFO: 2024-08-28 09:43:56,137: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 158.61s INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7865670306157933, 'len': 0.9983390643026695, 'lcs': 0.97} INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.765 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.725 0.865 0.485 INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Processing Dataset: 355.94s INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-08-28 09:44:12,524: llmtf.base.darumeru/ruMMLU: {'acc': 0.5018457547640427} INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:44:25,707: llmtf.base.daru/treewayextractive: Loading Dataset: 13.14s INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Processing Dataset: 376.40s INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-08-28 09:44:33,922: llmtf.base.darumeru/MultiQ: {'f1': 0.2660316706577536, 'em': 0.15487571701720843} INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:44:37,006: llmtf.base.darumeru/USE: Loading Dataset: 3.08s INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Processing Dataset: 452.34s INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-08-28 09:45:49,906: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3550759319426331, 'rouge2': 0.12663323877762525} INFO: 2024-08-28 09:45:49,909: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:45:49,910: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.662 0.241 0.210 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 INFO: 2024-08-28 09:48:00,114: llmtf.base.darumeru/USE: Processing Dataset: 203.11s INFO: 2024-08-28 09:48:00,115: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-08-28 09:48:00,116: llmtf.base.darumeru/USE: {'grade_norm': 0.0931372549019608} INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.622 0.241 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 INFO: 2024-08-28 09:49:59,058: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.72s INFO: 2024-08-28 09:49:59,059: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-08-28 09:49:59,121: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.270000 anatomy 0.400000 astronomy 0.677632 business_ethics 0.560000 clinical_knowledge 0.584906 college_biology 0.506944 college_chemistry 0.420000 college_computer_science 0.430000 college_mathematics 0.310000 college_medicine 0.549133 college_physics 0.352941 computer_security 0.530000 conceptual_physics 0.493617 econometrics 0.377193 electrical_engineering 0.503448 elementary_mathematics 0.362434 formal_logic 0.404762 global_facts 0.310000 high_school_biology 0.674194 high_school_chemistry 0.413793 high_school_computer_science 0.620000 high_school_european_history 0.709091 high_school_geography 0.686869 high_school_government_and_politics 0.616580 high_school_macroeconomics 0.507692 high_school_mathematics 0.355556 high_school_microeconomics 0.516807 high_school_physics 0.377483 high_school_psychology 0.684404 high_school_statistics 0.467593 high_school_us_history 0.686275 high_school_world_history 0.725738 human_aging 0.529148 human_sexuality 0.610687 international_law 0.652893 jurisprudence 0.601852 logical_fallacies 0.509202 machine_learning 0.330357 management 0.669903 marketing 0.722222 medical_genetics 0.550000 miscellaneous 0.627075 moral_disputes 0.572254 moral_scenarios 0.218994 nutrition 0.601307 philosophy 0.598071 prehistory 0.533951 professional_accounting 0.382979 professional_law 0.355280 professional_medicine 0.533088 professional_psychology 0.486928 public_relations 0.563636 security_studies 0.616327 sociology 0.726368 us_foreign_policy 0.730000 virology 0.463855 world_religions 0.678363 INFO: 2024-08-28 09:49:59,130: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.449777 humanities 0.557440 other (business, health, misc.) 0.534544 social sciences 0.593624 INFO: 2024-08-28 09:49:59,135: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.53384650825764} INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Processing Dataset: 377.93s INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-08-28 09:50:43,882: llmtf.base.daru/treewayextractive: {'r-prec': 0.4306020202020202} INFO: 2024-08-28 09:50:43,926: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:50:43,927: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.604 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.534 0.485 INFO: 2024-08-28 09:51:38,740: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 99.57s INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 320.48s INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-08-28 09:56:59,279: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.622222 astronomy 0.723684 business_ethics 0.600000 clinical_knowledge 0.720755 college_biology 0.777778 college_chemistry 0.460000 college_computer_science 0.550000 college_mathematics 0.390000 college_medicine 0.647399 college_physics 0.421569 computer_security 0.780000 conceptual_physics 0.561702 econometrics 0.412281 electrical_engineering 0.572414 elementary_mathematics 0.447090 formal_logic 0.523810 global_facts 0.440000 high_school_biology 0.803226 high_school_chemistry 0.512315 high_school_computer_science 0.710000 high_school_european_history 0.727273 high_school_geography 0.772727 high_school_government_and_politics 0.849741 high_school_macroeconomics 0.661538 high_school_mathematics 0.374074 high_school_microeconomics 0.747899 high_school_physics 0.417219 high_school_psychology 0.840367 high_school_statistics 0.611111 high_school_us_history 0.789216 high_school_world_history 0.818565 human_aging 0.677130 human_sexuality 0.763359 international_law 0.743802 jurisprudence 0.777778 logical_fallacies 0.773006 machine_learning 0.464286 management 0.825243 marketing 0.854701 medical_genetics 0.760000 miscellaneous 0.817369 moral_disputes 0.664740 moral_scenarios 0.448045 nutrition 0.735294 philosophy 0.678457 prehistory 0.682099 professional_accounting 0.524823 professional_law 0.453716 professional_medicine 0.764706 professional_psychology 0.643791 public_relations 0.654545 security_studies 0.685714 sociology 0.830846 us_foreign_policy 0.820000 virology 0.500000 world_religions 0.789474 INFO: 2024-08-28 09:56:59,286: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.549248 humanities 0.682306 other (business, health, misc.) 0.677832 social sciences 0.723567 INFO: 2024-08-28 09:56:59,291: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6582382725054904} INFO: 2024-08-28 09:56:59,323: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:56:59,325: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.607 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.658 0.534 0.485 |