|
INFO: 2024-10-26 10:00:21,600: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:23,554: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:25,061: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:25,825: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.22s |
|
INFO: 2024-10-26 10:00:25,839: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s |
|
INFO: 2024-10-26 10:00:27,118: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:27,739: llmtf.base.darumeru/RCB: Loading Dataset: 2.68s |
|
INFO: 2024-10-26 10:00:29,086: llmtf.base.darumeru/PARus: Processing Dataset: 3.25s |
|
INFO: 2024-10-26 10:00:29,088: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-26 10:00:29,102: llmtf.base.darumeru/PARus: {'acc': 0.8} |
|
INFO: 2024-10-26 10:00:29,102: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:00:29,105: llmtf.base.evaluator: |
|
mean darumeru/PARus |
|
0.800 0.800 |
|
INFO: 2024-10-26 10:00:30,121: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:30,879: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.76s |
|
INFO: 2024-10-26 10:00:31,330: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:32,861: llmtf.base.darumeru/RCB: Processing Dataset: 5.12s |
|
INFO: 2024-10-26 10:00:32,862: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-26 10:00:32,870: llmtf.base.darumeru/RCB: {'acc': 0.5863636363636363, 'f1_macro': 0.520344156087331} |
|
INFO: 2024-10-26 10:00:32,871: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:00:32,874: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB |
|
0.677 0.800 0.553 |
|
INFO: 2024-10-26 10:00:33,148: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.03s |
|
INFO: 2024-10-26 10:00:33,497: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:34,009: llmtf.base.darumeru/RWSD: Loading Dataset: 2.68s |
|
INFO: 2024-10-26 10:00:35,785: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.64s |
|
INFO: 2024-10-26 10:00:35,787: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-26 10:00:35,795: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9048601269315972} |
|
INFO: 2024-10-26 10:00:35,795: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:00:35,799: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/ruWorldTree |
|
0.753 0.800 0.553 0.905 |
|
INFO: 2024-10-26 10:00:36,096: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:39,654: llmtf.base.darumeru/RWSD: Processing Dataset: 5.64s |
|
INFO: 2024-10-26 10:00:39,655: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-26 10:00:39,660: llmtf.base.darumeru/RWSD: {'acc': 0.5343137254901961} |
|
INFO: 2024-10-26 10:00:39,660: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:00:39,664: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruWorldTree |
|
0.698 0.800 0.553 0.534 0.905 |
|
INFO: 2024-10-26 10:00:40,963: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:43,550: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:00:44,771: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.81s |
|
INFO: 2024-10-26 10:00:46,698: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.15s |
|
INFO: 2024-10-26 10:00:46,725: llmtf.base.daru/treewayextractive: Loading Dataset: 13.23s |
|
INFO: 2024-10-26 10:01:01,719: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 30.84s |
|
INFO: 2024-10-26 10:01:01,720: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-26 10:01:01,734: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8191580756013745, 'f1_macro': 0.8196610608491144} |
|
INFO: 2024-10-26 10:01:01,743: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:01:01,747: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.722 0.800 0.553 0.534 0.819 0.905 |
|
INFO: 2024-10-26 10:02:41,817: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 115.10s |
|
INFO: 2024-10-26 10:02:41,820: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-26 10:02:41,823: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9920472926206347, 'len': 0.9992233172309354, 'lcs': 1.0} |
|
INFO: 2024-10-26 10:02:41,824: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:02:41,830: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 125.73s |
|
INFO: 2024-10-26 10:02:41,835: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.769 0.800 0.553 0.534 1.000 0.819 0.905 |
|
INFO: 2024-10-26 10:03:13,156: llmtf.base.daru/treewayabstractive: Processing Dataset: 148.37s |
|
INFO: 2024-10-26 10:03:13,158: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-26 10:03:13,162: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3486928379990829, 'rouge2': 0.12579847916639003} |
|
INFO: 2024-10-26 10:03:13,164: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:03:13,169: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.693 0.237 0.800 0.553 0.534 1.000 0.819 0.905 |
|
INFO: 2024-10-26 10:04:19,979: llmtf.base.darumeru/MultiQ: Processing Dataset: 234.15s |
|
INFO: 2024-10-26 10:04:19,982: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-26 10:04:19,986: llmtf.base.darumeru/MultiQ: {'f1': 0.28476692977698215, 'em': 0.17304015296367112} |
|
INFO: 2024-10-26 10:04:19,991: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:04:19,997: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.635 0.237 0.229 0.800 0.553 0.534 1.000 0.819 0.905 |
|
INFO: 2024-10-26 10:04:32,092: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-26 10:05:58,778: llmtf.base.daru/treewayextractive: Processing Dataset: 312.04s |
|
INFO: 2024-10-26 10:05:58,782: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-10-26 10:05:59,016: llmtf.base.daru/treewayextractive: {'r-prec': 0.3931765512265512} |
|
INFO: 2024-10-26 10:05:59,058: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:05:59,066: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.608 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 |
|
INFO: 2024-10-26 10:06:34,248: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 122.16s |
|
INFO: 2024-10-26 10:07:49,582: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 307.75s |
|
INFO: 2024-10-26 10:07:49,584: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-26 10:07:49,630: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.430000 |
|
anatomy 0.577778 |
|
astronomy 0.743421 |
|
business_ethics 0.670000 |
|
clinical_knowledge 0.701887 |
|
college_biology 0.687500 |
|
college_chemistry 0.470000 |
|
college_computer_science 0.640000 |
|
college_mathematics 0.470000 |
|
college_medicine 0.589595 |
|
college_physics 0.490196 |
|
computer_security 0.710000 |
|
conceptual_physics 0.668085 |
|
econometrics 0.464912 |
|
electrical_engineering 0.579310 |
|
elementary_mathematics 0.624339 |
|
formal_logic 0.420635 |
|
global_facts 0.400000 |
|
high_school_biology 0.806452 |
|
high_school_chemistry 0.536946 |
|
high_school_computer_science 0.790000 |
|
high_school_european_history 0.763636 |
|
high_school_geography 0.777778 |
|
high_school_government_and_politics 0.715026 |
|
high_school_macroeconomics 0.653846 |
|
high_school_mathematics 0.462963 |
|
high_school_microeconomics 0.714286 |
|
high_school_physics 0.490066 |
|
high_school_psychology 0.796330 |
|
high_school_statistics 0.625000 |
|
high_school_us_history 0.754902 |
|
high_school_world_history 0.776371 |
|
human_aging 0.618834 |
|
human_sexuality 0.717557 |
|
international_law 0.702479 |
|
jurisprudence 0.685185 |
|
logical_fallacies 0.613497 |
|
machine_learning 0.446429 |
|
management 0.737864 |
|
marketing 0.799145 |
|
medical_genetics 0.650000 |
|
miscellaneous 0.717752 |
|
moral_disputes 0.604046 |
|
moral_scenarios 0.242458 |
|
nutrition 0.705882 |
|
philosophy 0.639871 |
|
prehistory 0.626543 |
|
professional_accounting 0.446809 |
|
professional_law 0.399609 |
|
professional_medicine 0.595588 |
|
professional_psychology 0.601307 |
|
public_relations 0.600000 |
|
security_studies 0.673469 |
|
sociology 0.711443 |
|
us_foreign_policy 0.800000 |
|
virology 0.500000 |
|
world_religions 0.730994 |
|
INFO: 2024-10-26 10:07:49,638: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.592817 |
|
humanities 0.612325 |
|
other (business, health, misc.) 0.622224 |
|
social sciences 0.685496 |
|
INFO: 2024-10-26 10:07:49,666: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6282155966274303} |
|
INFO: 2024-10-26 10:07:49,702: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:07:49,714: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.610 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.628 |
|
INFO: 2024-10-26 10:11:09,714: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 275.46s |
|
INFO: 2024-10-26 10:11:09,718: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-26 10:11:09,763: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.510000 |
|
anatomy 0.703704 |
|
astronomy 0.848684 |
|
business_ethics 0.770000 |
|
clinical_knowledge 0.777358 |
|
college_biology 0.847222 |
|
college_chemistry 0.550000 |
|
college_computer_science 0.710000 |
|
college_mathematics 0.470000 |
|
college_medicine 0.693642 |
|
college_physics 0.519608 |
|
computer_security 0.780000 |
|
conceptual_physics 0.706383 |
|
econometrics 0.596491 |
|
electrical_engineering 0.668966 |
|
elementary_mathematics 0.666667 |
|
formal_logic 0.484127 |
|
global_facts 0.450000 |
|
high_school_biology 0.867742 |
|
high_school_chemistry 0.630542 |
|
high_school_computer_science 0.860000 |
|
high_school_european_history 0.800000 |
|
high_school_geography 0.878788 |
|
high_school_government_and_politics 0.943005 |
|
high_school_macroeconomics 0.761538 |
|
high_school_mathematics 0.551852 |
|
high_school_microeconomics 0.865546 |
|
high_school_physics 0.582781 |
|
high_school_psychology 0.882569 |
|
high_school_statistics 0.717593 |
|
high_school_us_history 0.848039 |
|
high_school_world_history 0.848101 |
|
human_aging 0.784753 |
|
human_sexuality 0.748092 |
|
international_law 0.785124 |
|
jurisprudence 0.787037 |
|
logical_fallacies 0.834356 |
|
machine_learning 0.535714 |
|
management 0.864078 |
|
marketing 0.901709 |
|
medical_genetics 0.790000 |
|
miscellaneous 0.846743 |
|
moral_disputes 0.731214 |
|
moral_scenarios 0.401117 |
|
nutrition 0.790850 |
|
philosophy 0.729904 |
|
prehistory 0.793210 |
|
professional_accounting 0.570922 |
|
professional_law 0.507171 |
|
professional_medicine 0.764706 |
|
professional_psychology 0.745098 |
|
public_relations 0.700000 |
|
security_studies 0.759184 |
|
sociology 0.845771 |
|
us_foreign_policy 0.880000 |
|
virology 0.518072 |
|
world_religions 0.847953 |
|
INFO: 2024-10-26 10:11:09,771: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.667986 |
|
humanities 0.722873 |
|
other (business, health, misc.) 0.730467 |
|
social sciences 0.800507 |
|
INFO: 2024-10-26 10:11:09,779: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7304583527816666} |
|
INFO: 2024-10-26 10:11:09,811: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-26 10:11:09,819: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.621 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.730 0.628 |
|
|