|
INFO: 2024-10-17 21:30:14,019: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:30:14,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:30:20,481: llmtf.base.darumeru/MultiQ: Loading Dataset: 6.46s |
|
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Processing Dataset: 339.11s |
|
INFO: 2024-10-17 21:35:59,593: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-17 21:35:59,594: llmtf.base.darumeru/MultiQ: {'f1': 0.3346248767848689, 'em': 0.22275334608030592} |
|
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:35:59,599: llmtf.base.evaluator: |
|
mean darumeru/MultiQ |
|
0.279 0.279 |
|
INFO: 2024-10-17 21:36:08,809: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:36:08,810: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:36:12,969: llmtf.base.darumeru/PARus: Loading Dataset: 4.16s |
|
INFO: 2024-10-17 21:36:18,316: llmtf.base.darumeru/PARus: Processing Dataset: 5.35s |
|
INFO: 2024-10-17 21:36:18,317: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-17 21:36:18,327: llmtf.base.darumeru/PARus: {'acc': 0.7} |
|
INFO: 2024-10-17 21:36:18,327: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:36:18,328: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus |
|
0.489 0.279 0.700 |
|
INFO: 2024-10-17 21:36:27,550: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-10-17 21:36:27,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:36:27,551: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:36:31,450: llmtf.base.darumeru/RCB: Loading Dataset: 3.90s |
|
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Processing Dataset: 7.23s |
|
INFO: 2024-10-17 21:36:38,683: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-17 21:36:38,686: llmtf.base.darumeru/RCB: {'acc': 0.5454545454545454, 'f1_macro': 0.49090309951702227} |
|
INFO: 2024-10-17 21:36:38,687: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:36:38,688: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB |
|
0.499 0.279 0.700 0.518 |
|
INFO: 2024-10-17 21:36:48,734: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:36:48,735: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:36:54,900: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 6.17s |
|
INFO: 2024-10-17 21:38:00,519: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 65.62s |
|
INFO: 2024-10-17 21:38:00,520: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-17 21:38:00,532: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7302405498281787, 'f1_macro': 0.7304546157096631} |
|
INFO: 2024-10-17 21:38:00,541: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:38:00,542: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA |
|
0.557 0.279 0.700 0.518 0.730 |
|
INFO: 2024-10-17 21:38:09,745: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:38:09,745: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:38:14,102: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 4.36s |
|
INFO: 2024-10-17 21:38:16,932: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.83s |
|
INFO: 2024-10-17 21:38:16,933: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-17 21:38:16,936: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9043404138496471} |
|
INFO: 2024-10-17 21:38:16,936: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:38:16,937: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.626 0.279 0.700 0.518 0.730 0.905 |
|
INFO: 2024-10-17 21:38:26,077: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:38:26,077: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:38:30,781: llmtf.base.darumeru/RWSD: Loading Dataset: 4.70s |
|
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Processing Dataset: 5.72s |
|
INFO: 2024-10-17 21:38:36,497: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-17 21:38:36,498: llmtf.base.darumeru/RWSD: {'acc': 0.6029411764705882} |
|
INFO: 2024-10-17 21:38:36,499: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:38:36,500: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.622 0.279 0.700 0.518 0.603 0.730 0.905 |
|
INFO: 2024-10-17 21:38:45,688: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:38:45,688: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:39:02,002: llmtf.base.daru/treewayextractive: Loading Dataset: 16.31s |
|
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Processing Dataset: 183.77s |
|
INFO: 2024-10-17 21:42:05,777: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-10-17 21:42:06,010: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917218614718615} |
|
INFO: 2024-10-17 21:42:06,052: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:42:06,054: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.589 0.392 0.279 0.700 0.518 0.603 0.730 0.905 |
|
INFO: 2024-10-17 21:42:15,170: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:42:15,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:46:47,282: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 272.11s |
|
INFO: 2024-10-17 21:56:29,398: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 582.12s |
|
INFO: 2024-10-17 21:56:29,399: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-17 21:56:29,464: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.340000 |
|
anatomy 0.414815 |
|
astronomy 0.611842 |
|
business_ethics 0.610000 |
|
clinical_knowledge 0.554717 |
|
college_biology 0.548611 |
|
college_chemistry 0.380000 |
|
college_computer_science 0.450000 |
|
college_mathematics 0.400000 |
|
college_medicine 0.526012 |
|
college_physics 0.470588 |
|
computer_security 0.620000 |
|
conceptual_physics 0.565957 |
|
econometrics 0.377193 |
|
electrical_engineering 0.537931 |
|
elementary_mathematics 0.529101 |
|
formal_logic 0.365079 |
|
global_facts 0.360000 |
|
high_school_biology 0.664516 |
|
high_school_chemistry 0.487685 |
|
high_school_computer_science 0.700000 |
|
high_school_european_history 0.751515 |
|
high_school_geography 0.722222 |
|
high_school_government_and_politics 0.564767 |
|
high_school_macroeconomics 0.528205 |
|
high_school_mathematics 0.433333 |
|
high_school_microeconomics 0.533613 |
|
high_school_physics 0.403974 |
|
high_school_psychology 0.713761 |
|
high_school_statistics 0.523148 |
|
high_school_us_history 0.661765 |
|
high_school_world_history 0.717300 |
|
human_aging 0.587444 |
|
human_sexuality 0.618321 |
|
international_law 0.735537 |
|
jurisprudence 0.666667 |
|
logical_fallacies 0.564417 |
|
machine_learning 0.392857 |
|
management 0.650485 |
|
marketing 0.752137 |
|
medical_genetics 0.580000 |
|
miscellaneous 0.632184 |
|
moral_disputes 0.583815 |
|
moral_scenarios 0.299441 |
|
nutrition 0.637255 |
|
philosophy 0.617363 |
|
prehistory 0.561728 |
|
professional_accounting 0.386525 |
|
professional_law 0.377445 |
|
professional_medicine 0.481618 |
|
professional_psychology 0.516340 |
|
public_relations 0.500000 |
|
security_studies 0.648980 |
|
sociology 0.756219 |
|
us_foreign_policy 0.720000 |
|
virology 0.439759 |
|
world_religions 0.719298 |
|
INFO: 2024-10-17 21:56:29,473: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.503308 |
|
humanities 0.586259 |
|
other (business, health, misc.) 0.543782 |
|
social sciences 0.599968 |
|
INFO: 2024-10-17 21:56:29,478: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5583294528508019} |
|
INFO: 2024-10-17 21:56:29,516: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 21:56:29,518: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.586 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.558 |
|
INFO: 2024-10-17 21:56:39,535: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 21:56:39,536: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 21:58:54,966: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 135.43s |
|
INFO: 2024-10-17 22:08:04,419: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 549.45s |
|
INFO: 2024-10-17 22:08:04,426: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-17 22:08:04,492: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.380000 |
|
anatomy 0.637037 |
|
astronomy 0.717105 |
|
business_ethics 0.700000 |
|
clinical_knowledge 0.705660 |
|
college_biology 0.715278 |
|
college_chemistry 0.470000 |
|
college_computer_science 0.580000 |
|
college_mathematics 0.330000 |
|
college_medicine 0.664740 |
|
college_physics 0.509804 |
|
computer_security 0.740000 |
|
conceptual_physics 0.642553 |
|
econometrics 0.508772 |
|
electrical_engineering 0.600000 |
|
elementary_mathematics 0.547619 |
|
formal_logic 0.412698 |
|
global_facts 0.360000 |
|
high_school_biology 0.783871 |
|
high_school_chemistry 0.581281 |
|
high_school_computer_science 0.710000 |
|
high_school_european_history 0.800000 |
|
high_school_geography 0.757576 |
|
high_school_government_and_politics 0.854922 |
|
high_school_macroeconomics 0.679487 |
|
high_school_mathematics 0.455556 |
|
high_school_microeconomics 0.773109 |
|
high_school_physics 0.437086 |
|
high_school_psychology 0.844037 |
|
high_school_statistics 0.652778 |
|
high_school_us_history 0.833333 |
|
high_school_world_history 0.843882 |
|
human_aging 0.677130 |
|
human_sexuality 0.786260 |
|
international_law 0.768595 |
|
jurisprudence 0.814815 |
|
logical_fallacies 0.803681 |
|
machine_learning 0.446429 |
|
management 0.786408 |
|
marketing 0.858974 |
|
medical_genetics 0.760000 |
|
miscellaneous 0.795658 |
|
moral_disputes 0.667630 |
|
moral_scenarios 0.311732 |
|
nutrition 0.732026 |
|
philosophy 0.704180 |
|
prehistory 0.712963 |
|
professional_accounting 0.503546 |
|
professional_law 0.457627 |
|
professional_medicine 0.658088 |
|
professional_psychology 0.668301 |
|
public_relations 0.709091 |
|
security_studies 0.697959 |
|
sociology 0.800995 |
|
us_foreign_policy 0.800000 |
|
virology 0.506024 |
|
world_religions 0.801170 |
|
INFO: 2024-10-17 22:08:04,506: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.572187 |
|
humanities 0.687100 |
|
other (business, health, misc.) 0.667521 |
|
social sciences 0.740042 |
|
INFO: 2024-10-17 22:08:04,511: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6667125709237595} |
|
INFO: 2024-10-17 22:08:04,554: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 22:08:04,556: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.595 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558 |
|
INFO: 2024-10-17 22:08:14,512: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 22:08:14,513: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 22:08:18,791: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.28s |
|
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Processing Dataset: 207.47s |
|
INFO: 2024-10-17 22:11:46,260: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-17 22:11:46,261: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33109987599556284, 'rouge2': 0.11202889150257295} |
|
INFO: 2024-10-17 22:11:46,262: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 22:11:46,263: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.557 0.222 0.392 0.279 0.700 0.518 0.603 0.730 0.905 0.667 0.558 |
|
INFO: 2024-10-17 22:11:55,717: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-17 22:11:55,717: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-17 22:11:59,846: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.13s |
|
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 150.13s |
|
INFO: 2024-10-17 22:14:29,975: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-17 22:14:29,976: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.993754090002875, 'len': 0.9986883734384026, 'lcs': 0.98} |
|
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-17 22:14:29,977: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.596 0.222 0.392 0.279 0.700 0.518 0.603 0.980 0.730 0.905 0.667 0.558 |
|
|