INFO: 2024-08-28 09:38:05,098: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom'] INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:07,960: llmtf.base.darumeru/PARus: Loading Dataset: 2.86s INFO: 2024-08-28 09:38:07,992: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive'] INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:09,381: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu'] INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Processing Dataset: 3.52s INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-08-28 09:38:11,487: llmtf.base.darumeru/PARus: {'acc': 0.61} INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:13,100: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:13,501: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s INFO: 2024-08-28 09:38:13,809: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use'] INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:15,664: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru'] INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:16,575: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.58s INFO: 2024-08-28 09:38:17,521: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s INFO: 2024-08-28 09:38:17,560: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.46s INFO: 2024-08-28 09:38:17,771: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en'] INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:18,163: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.50s INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Processing Dataset: 5.47s INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-08-28 09:38:18,971: llmtf.base.darumeru/RCB: {'acc': 0.4590909090909091, 'f1_macro': 0.41511023060616065} INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:38:20,244: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.47s INFO: 2024-08-28 09:38:21,566: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.59s INFO: 2024-08-28 09:38:59,395: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 37.83s INFO: 2024-08-28 09:38:59,396: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-08-28 09:38:59,405: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7246563573883161, 'f1_macro': 0.7254261079279148} INFO: 2024-08-28 09:38:59,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:38:59,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:01,164: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 1.75s INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.01s INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-08-28 09:39:03,179: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8640387481371088} INFO: 2024-08-28 09:39:03,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:39:03,180: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:05,009: llmtf.base.darumeru/RWSD: Loading Dataset: 1.83s INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Processing Dataset: 5.96s INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-08-28 09:39:10,970: llmtf.base.darumeru/RWSD: {'acc': 0.5686274509803921} INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:39:14,992: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.02s INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 43.71s INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-08-28 09:39:58,716: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7100825260136348, 'mcc': 0.2607217783495962} INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.615 0.610 0.437 0.569 0.725 0.865 0.485 INFO: 2024-08-28 09:41:10,442: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 170.20s INFO: 2024-08-28 09:41:10,443: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-08-28 09:41:10,444: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.394786509645572, 'len': 0.9984205596649908, 'lcs': 0.9939024390243902} INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:41:12,462: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.02s INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 177.42s INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-08-28 09:41:15,581: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.705603375484548, 'len': 0.9949612573353719, 'lcs': 0.9404517453798767} INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:41:17,523: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.94s INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Processing Dataset: 143.27s INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-08-28 09:43:35,735: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.521261233892721, 'len': 0.9996383092887717, 'lcs': 1.0} INFO: 2024-08-28 09:43:35,735: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:43:35,736: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.743 0.610 0.437 0.569 1.000 0.998 0.995 0.725 0.865 0.485 INFO: 2024-08-28 09:43:52,340: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 342.96s INFO: 2024-08-28 09:43:56,137: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 158.61s INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7865670306157933, 'len': 0.9983390643026695, 'lcs': 0.97} INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.765 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.725 0.865 0.485 INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Processing Dataset: 355.94s INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-08-28 09:44:12,524: llmtf.base.darumeru/ruMMLU: {'acc': 0.5018457547640427} INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:44:25,707: llmtf.base.daru/treewayextractive: Loading Dataset: 13.14s INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Processing Dataset: 376.40s INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-08-28 09:44:33,922: llmtf.base.darumeru/MultiQ: {'f1': 0.2660316706577536, 'em': 0.15487571701720843} INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:44:37,006: llmtf.base.darumeru/USE: Loading Dataset: 3.08s INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Processing Dataset: 452.34s INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-08-28 09:45:49,906: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3550759319426331, 'rouge2': 0.12663323877762525} INFO: 2024-08-28 09:45:49,909: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:45:49,910: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.662 0.241 0.210 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 INFO: 2024-08-28 09:48:00,114: llmtf.base.darumeru/USE: Processing Dataset: 203.11s INFO: 2024-08-28 09:48:00,115: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-08-28 09:48:00,116: llmtf.base.darumeru/USE: {'grade_norm': 0.0931372549019608} INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.622 0.241 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 INFO: 2024-08-28 09:49:59,058: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.72s INFO: 2024-08-28 09:49:59,059: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-08-28 09:49:59,121: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.270000 anatomy 0.400000 astronomy 0.677632 business_ethics 0.560000 clinical_knowledge 0.584906 college_biology 0.506944 college_chemistry 0.420000 college_computer_science 0.430000 college_mathematics 0.310000 college_medicine 0.549133 college_physics 0.352941 computer_security 0.530000 conceptual_physics 0.493617 econometrics 0.377193 electrical_engineering 0.503448 elementary_mathematics 0.362434 formal_logic 0.404762 global_facts 0.310000 high_school_biology 0.674194 high_school_chemistry 0.413793 high_school_computer_science 0.620000 high_school_european_history 0.709091 high_school_geography 0.686869 high_school_government_and_politics 0.616580 high_school_macroeconomics 0.507692 high_school_mathematics 0.355556 high_school_microeconomics 0.516807 high_school_physics 0.377483 high_school_psychology 0.684404 high_school_statistics 0.467593 high_school_us_history 0.686275 high_school_world_history 0.725738 human_aging 0.529148 human_sexuality 0.610687 international_law 0.652893 jurisprudence 0.601852 logical_fallacies 0.509202 machine_learning 0.330357 management 0.669903 marketing 0.722222 medical_genetics 0.550000 miscellaneous 0.627075 moral_disputes 0.572254 moral_scenarios 0.218994 nutrition 0.601307 philosophy 0.598071 prehistory 0.533951 professional_accounting 0.382979 professional_law 0.355280 professional_medicine 0.533088 professional_psychology 0.486928 public_relations 0.563636 security_studies 0.616327 sociology 0.726368 us_foreign_policy 0.730000 virology 0.463855 world_religions 0.678363 INFO: 2024-08-28 09:49:59,130: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.449777 humanities 0.557440 other (business, health, misc.) 0.534544 social sciences 0.593624 INFO: 2024-08-28 09:49:59,135: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.53384650825764} INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Processing Dataset: 377.93s INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-08-28 09:50:43,882: llmtf.base.daru/treewayextractive: {'r-prec': 0.4306020202020202} INFO: 2024-08-28 09:50:43,926: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:50:43,927: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.604 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.534 0.485 INFO: 2024-08-28 09:51:38,740: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 99.57s INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 320.48s INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-08-28 09:56:59,279: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.622222 astronomy 0.723684 business_ethics 0.600000 clinical_knowledge 0.720755 college_biology 0.777778 college_chemistry 0.460000 college_computer_science 0.550000 college_mathematics 0.390000 college_medicine 0.647399 college_physics 0.421569 computer_security 0.780000 conceptual_physics 0.561702 econometrics 0.412281 electrical_engineering 0.572414 elementary_mathematics 0.447090 formal_logic 0.523810 global_facts 0.440000 high_school_biology 0.803226 high_school_chemistry 0.512315 high_school_computer_science 0.710000 high_school_european_history 0.727273 high_school_geography 0.772727 high_school_government_and_politics 0.849741 high_school_macroeconomics 0.661538 high_school_mathematics 0.374074 high_school_microeconomics 0.747899 high_school_physics 0.417219 high_school_psychology 0.840367 high_school_statistics 0.611111 high_school_us_history 0.789216 high_school_world_history 0.818565 human_aging 0.677130 human_sexuality 0.763359 international_law 0.743802 jurisprudence 0.777778 logical_fallacies 0.773006 machine_learning 0.464286 management 0.825243 marketing 0.854701 medical_genetics 0.760000 miscellaneous 0.817369 moral_disputes 0.664740 moral_scenarios 0.448045 nutrition 0.735294 philosophy 0.678457 prehistory 0.682099 professional_accounting 0.524823 professional_law 0.453716 professional_medicine 0.764706 professional_psychology 0.643791 public_relations 0.654545 security_studies 0.685714 sociology 0.830846 us_foreign_policy 0.820000 virology 0.500000 world_religions 0.789474 INFO: 2024-08-28 09:56:59,286: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.549248 humanities 0.682306 other (business, health, misc.) 0.677832 social sciences 0.723567 INFO: 2024-08-28 09:56:59,291: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6582382725054904} INFO: 2024-08-28 09:56:59,323: llmtf.base.evaluator: Ended eval INFO: 2024-08-28 09:56:59,325: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.607 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.658 0.534 0.485