INFO: 2024-10-26 10:00:21,600: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:23,554: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:25,061: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:25,825: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.22s INFO: 2024-10-26 10:00:25,839: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s INFO: 2024-10-26 10:00:27,118: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:27,739: llmtf.base.darumeru/RCB: Loading Dataset: 2.68s INFO: 2024-10-26 10:00:29,086: llmtf.base.darumeru/PARus: Processing Dataset: 3.25s INFO: 2024-10-26 10:00:29,088: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-26 10:00:29,102: llmtf.base.darumeru/PARus: {'acc': 0.8} INFO: 2024-10-26 10:00:29,102: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:00:29,105: llmtf.base.evaluator: mean darumeru/PARus 0.800 0.800 INFO: 2024-10-26 10:00:30,121: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:30,879: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.76s INFO: 2024-10-26 10:00:31,330: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:32,861: llmtf.base.darumeru/RCB: Processing Dataset: 5.12s INFO: 2024-10-26 10:00:32,862: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-26 10:00:32,870: llmtf.base.darumeru/RCB: {'acc': 0.5863636363636363, 'f1_macro': 0.520344156087331} INFO: 2024-10-26 10:00:32,871: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:00:32,874: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB 0.677 0.800 0.553 INFO: 2024-10-26 10:00:33,148: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.03s INFO: 2024-10-26 10:00:33,497: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:34,009: llmtf.base.darumeru/RWSD: Loading Dataset: 2.68s INFO: 2024-10-26 10:00:35,785: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.64s INFO: 2024-10-26 10:00:35,787: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-26 10:00:35,795: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9048601269315972} INFO: 2024-10-26 10:00:35,795: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:00:35,799: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/ruWorldTree 0.753 0.800 0.553 0.905 INFO: 2024-10-26 10:00:36,096: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:39,654: llmtf.base.darumeru/RWSD: Processing Dataset: 5.64s INFO: 2024-10-26 10:00:39,655: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-26 10:00:39,660: llmtf.base.darumeru/RWSD: {'acc': 0.5343137254901961} INFO: 2024-10-26 10:00:39,660: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:00:39,664: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruWorldTree 0.698 0.800 0.553 0.534 0.905 INFO: 2024-10-26 10:00:40,963: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:43,550: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:00:44,771: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.81s INFO: 2024-10-26 10:00:46,698: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.15s INFO: 2024-10-26 10:00:46,725: llmtf.base.daru/treewayextractive: Loading Dataset: 13.23s INFO: 2024-10-26 10:01:01,719: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 30.84s INFO: 2024-10-26 10:01:01,720: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-26 10:01:01,734: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8191580756013745, 'f1_macro': 0.8196610608491144} INFO: 2024-10-26 10:01:01,743: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:01:01,747: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.722 0.800 0.553 0.534 0.819 0.905 INFO: 2024-10-26 10:02:41,817: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 115.10s INFO: 2024-10-26 10:02:41,820: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-26 10:02:41,823: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9920472926206347, 'len': 0.9992233172309354, 'lcs': 1.0} INFO: 2024-10-26 10:02:41,824: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:02:41,830: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 125.73s INFO: 2024-10-26 10:02:41,835: llmtf.base.evaluator: mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree 0.769 0.800 0.553 0.534 1.000 0.819 0.905 INFO: 2024-10-26 10:03:13,156: llmtf.base.daru/treewayabstractive: Processing Dataset: 148.37s INFO: 2024-10-26 10:03:13,158: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-26 10:03:13,162: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3486928379990829, 'rouge2': 0.12579847916639003} INFO: 2024-10-26 10:03:13,164: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:03:13,169: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree 0.693 0.237 0.800 0.553 0.534 1.000 0.819 0.905 INFO: 2024-10-26 10:04:19,979: llmtf.base.darumeru/MultiQ: Processing Dataset: 234.15s INFO: 2024-10-26 10:04:19,982: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-26 10:04:19,986: llmtf.base.darumeru/MultiQ: {'f1': 0.28476692977698215, 'em': 0.17304015296367112} INFO: 2024-10-26 10:04:19,991: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:04:19,997: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree 0.635 0.237 0.229 0.800 0.553 0.534 1.000 0.819 0.905 INFO: 2024-10-26 10:04:32,092: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-26 10:05:58,778: llmtf.base.daru/treewayextractive: Processing Dataset: 312.04s INFO: 2024-10-26 10:05:58,782: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-26 10:05:59,016: llmtf.base.daru/treewayextractive: {'r-prec': 0.3931765512265512} INFO: 2024-10-26 10:05:59,058: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:05:59,066: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree 0.608 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 INFO: 2024-10-26 10:06:34,248: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 122.16s INFO: 2024-10-26 10:07:49,582: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 307.75s INFO: 2024-10-26 10:07:49,584: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-26 10:07:49,630: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.430000 anatomy 0.577778 astronomy 0.743421 business_ethics 0.670000 clinical_knowledge 0.701887 college_biology 0.687500 college_chemistry 0.470000 college_computer_science 0.640000 college_mathematics 0.470000 college_medicine 0.589595 college_physics 0.490196 computer_security 0.710000 conceptual_physics 0.668085 econometrics 0.464912 electrical_engineering 0.579310 elementary_mathematics 0.624339 formal_logic 0.420635 global_facts 0.400000 high_school_biology 0.806452 high_school_chemistry 0.536946 high_school_computer_science 0.790000 high_school_european_history 0.763636 high_school_geography 0.777778 high_school_government_and_politics 0.715026 high_school_macroeconomics 0.653846 high_school_mathematics 0.462963 high_school_microeconomics 0.714286 high_school_physics 0.490066 high_school_psychology 0.796330 high_school_statistics 0.625000 high_school_us_history 0.754902 high_school_world_history 0.776371 human_aging 0.618834 human_sexuality 0.717557 international_law 0.702479 jurisprudence 0.685185 logical_fallacies 0.613497 machine_learning 0.446429 management 0.737864 marketing 0.799145 medical_genetics 0.650000 miscellaneous 0.717752 moral_disputes 0.604046 moral_scenarios 0.242458 nutrition 0.705882 philosophy 0.639871 prehistory 0.626543 professional_accounting 0.446809 professional_law 0.399609 professional_medicine 0.595588 professional_psychology 0.601307 public_relations 0.600000 security_studies 0.673469 sociology 0.711443 us_foreign_policy 0.800000 virology 0.500000 world_religions 0.730994 INFO: 2024-10-26 10:07:49,638: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.592817 humanities 0.612325 other (business, health, misc.) 0.622224 social sciences 0.685496 INFO: 2024-10-26 10:07:49,666: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6282155966274303} INFO: 2024-10-26 10:07:49,702: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:07:49,714: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.610 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.628 INFO: 2024-10-26 10:11:09,714: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 275.46s INFO: 2024-10-26 10:11:09,718: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-26 10:11:09,763: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.510000 anatomy 0.703704 astronomy 0.848684 business_ethics 0.770000 clinical_knowledge 0.777358 college_biology 0.847222 college_chemistry 0.550000 college_computer_science 0.710000 college_mathematics 0.470000 college_medicine 0.693642 college_physics 0.519608 computer_security 0.780000 conceptual_physics 0.706383 econometrics 0.596491 electrical_engineering 0.668966 elementary_mathematics 0.666667 formal_logic 0.484127 global_facts 0.450000 high_school_biology 0.867742 high_school_chemistry 0.630542 high_school_computer_science 0.860000 high_school_european_history 0.800000 high_school_geography 0.878788 high_school_government_and_politics 0.943005 high_school_macroeconomics 0.761538 high_school_mathematics 0.551852 high_school_microeconomics 0.865546 high_school_physics 0.582781 high_school_psychology 0.882569 high_school_statistics 0.717593 high_school_us_history 0.848039 high_school_world_history 0.848101 human_aging 0.784753 human_sexuality 0.748092 international_law 0.785124 jurisprudence 0.787037 logical_fallacies 0.834356 machine_learning 0.535714 management 0.864078 marketing 0.901709 medical_genetics 0.790000 miscellaneous 0.846743 moral_disputes 0.731214 moral_scenarios 0.401117 nutrition 0.790850 philosophy 0.729904 prehistory 0.793210 professional_accounting 0.570922 professional_law 0.507171 professional_medicine 0.764706 professional_psychology 0.745098 public_relations 0.700000 security_studies 0.759184 sociology 0.845771 us_foreign_policy 0.880000 virology 0.518072 world_religions 0.847953 INFO: 2024-10-26 10:11:09,771: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.667986 humanities 0.722873 other (business, health, misc.) 0.730467 social sciences 0.800507 INFO: 2024-10-26 10:11:09,779: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7304583527816666} INFO: 2024-10-26 10:11:09,811: llmtf.base.evaluator: Ended eval INFO: 2024-10-26 10:11:09,819: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.621 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.730 0.628