|
INFO: 2024-10-28 13:15:15,094: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-10-28 13:15:15,094: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:15:15,094: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:15:16,695: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-10-28 13:15:16,695: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:15:16,695: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:15:18,943: llmtf.base.darumeru/PARus: Loading Dataset: 2.25s |
|
INFO: 2024-10-28 13:15:19,297: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.20s |
|
INFO: 2024-10-28 13:15:22,318: llmtf.base.darumeru/PARus: Processing Dataset: 3.37s |
|
INFO: 2024-10-28 13:15:22,318: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-28 13:15:22,329: llmtf.base.darumeru/PARus: {'acc': 0.78} |
|
INFO: 2024-10-28 13:15:22,330: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:15:22,330: llmtf.base.evaluator: |
|
mean darumeru/PARus |
|
0.780 0.780 |
|
INFO: 2024-10-28 13:15:30,304: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-10-28 13:15:30,304: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:15:30,304: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:15:33,637: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.33s |
|
INFO: 2024-10-28 13:16:05,173: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 31.54s |
|
INFO: 2024-10-28 13:16:05,173: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-28 13:16:05,184: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8256013745704467, 'f1_macro': 0.8262484506706507} |
|
INFO: 2024-10-28 13:16:05,191: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:16:05,192: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/ruOpenBookQA |
|
0.803 0.780 0.826 |
|
INFO: 2024-10-28 13:16:13,923: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-10-28 13:16:13,923: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:16:13,923: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:16:16,429: llmtf.base.darumeru/RWSD: Loading Dataset: 2.51s |
|
INFO: 2024-10-28 13:16:22,246: llmtf.base.darumeru/RWSD: Processing Dataset: 5.82s |
|
INFO: 2024-10-28 13:16:22,246: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-28 13:16:22,247: llmtf.base.darumeru/RWSD: {'acc': 0.5441176470588235} |
|
INFO: 2024-10-28 13:16:22,248: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:16:22,249: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA |
|
0.717 0.780 0.544 0.826 |
|
INFO: 2024-10-28 13:16:31,348: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-28 13:16:31,348: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:16:31,348: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:18:38,554: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 127.21s |
|
INFO: 2024-10-28 13:20:06,478: llmtf.base.darumeru/MultiQ: Processing Dataset: 287.18s |
|
INFO: 2024-10-28 13:20:06,479: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-28 13:20:06,480: llmtf.base.darumeru/MultiQ: {'f1': 0.2503859074384594, 'em': 0.14531548757170173} |
|
INFO: 2024-10-28 13:20:06,488: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:20:06,489: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA |
|
0.587 0.198 0.780 0.544 0.826 |
|
INFO: 2024-10-28 13:20:15,334: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-10-28 13:20:15,335: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:20:15,335: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:20:18,179: llmtf.base.darumeru/RCB: Loading Dataset: 2.84s |
|
INFO: 2024-10-28 13:20:23,505: llmtf.base.darumeru/RCB: Processing Dataset: 5.33s |
|
INFO: 2024-10-28 13:20:23,506: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-28 13:20:23,510: llmtf.base.darumeru/RCB: {'acc': 0.5863636363636363, 'f1_macro': 0.5032640286161413} |
|
INFO: 2024-10-28 13:20:23,511: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:20:23,512: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA |
|
0.579 0.198 0.780 0.545 0.544 0.826 |
|
INFO: 2024-10-28 13:20:32,046: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-10-28 13:20:32,046: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:20:32,046: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:20:34,403: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.36s |
|
INFO: 2024-10-28 13:20:36,969: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.57s |
|
INFO: 2024-10-28 13:20:36,969: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-28 13:20:36,972: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9038817229146561} |
|
INFO: 2024-10-28 13:20:36,972: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:20:36,972: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.633 0.198 0.780 0.545 0.544 0.826 0.904 |
|
INFO: 2024-10-28 13:20:45,488: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-28 13:20:45,488: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:20:45,488: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:20:59,443: llmtf.base.daru/treewayextractive: Loading Dataset: 13.95s |
|
INFO: 2024-10-28 13:23:49,533: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 310.98s |
|
INFO: 2024-10-28 13:23:49,533: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-28 13:23:49,597: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.480000 |
|
anatomy 0.562963 |
|
astronomy 0.769737 |
|
business_ethics 0.640000 |
|
clinical_knowledge 0.671698 |
|
college_biology 0.673611 |
|
college_chemistry 0.470000 |
|
college_computer_science 0.670000 |
|
college_mathematics 0.440000 |
|
college_medicine 0.589595 |
|
college_physics 0.480392 |
|
computer_security 0.720000 |
|
conceptual_physics 0.655319 |
|
econometrics 0.482456 |
|
electrical_engineering 0.606897 |
|
elementary_mathematics 0.616402 |
|
formal_logic 0.428571 |
|
global_facts 0.370000 |
|
high_school_biology 0.809677 |
|
high_school_chemistry 0.571429 |
|
high_school_computer_science 0.770000 |
|
high_school_european_history 0.751515 |
|
high_school_geography 0.782828 |
|
high_school_government_and_politics 0.725389 |
|
high_school_macroeconomics 0.658974 |
|
high_school_mathematics 0.525926 |
|
high_school_microeconomics 0.705882 |
|
high_school_physics 0.463576 |
|
high_school_psychology 0.796330 |
|
high_school_statistics 0.606481 |
|
high_school_us_history 0.779412 |
|
high_school_world_history 0.801688 |
|
human_aging 0.632287 |
|
human_sexuality 0.717557 |
|
international_law 0.743802 |
|
jurisprudence 0.675926 |
|
logical_fallacies 0.662577 |
|
machine_learning 0.482143 |
|
management 0.747573 |
|
marketing 0.816239 |
|
medical_genetics 0.650000 |
|
miscellaneous 0.711367 |
|
moral_disputes 0.627168 |
|
moral_scenarios 0.244693 |
|
nutrition 0.689542 |
|
philosophy 0.646302 |
|
prehistory 0.660494 |
|
professional_accounting 0.439716 |
|
professional_law 0.411343 |
|
professional_medicine 0.613971 |
|
professional_psychology 0.591503 |
|
public_relations 0.545455 |
|
security_studies 0.665306 |
|
sociology 0.736318 |
|
us_foreign_policy 0.800000 |
|
virology 0.500000 |
|
world_religions 0.760234 |
|
INFO: 2024-10-28 13:23:49,606: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.600644 |
|
humanities 0.630286 |
|
other (business, health, misc.) 0.616782 |
|
social sciences 0.684000 |
|
INFO: 2024-10-28 13:23:49,611: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6329281396317665} |
|
INFO: 2024-10-28 13:23:49,646: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:23:49,648: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.633 0.198 0.780 0.545 0.544 0.826 0.904 0.633 |
|
INFO: 2024-10-28 13:23:57,887: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-28 13:23:57,887: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:23:57,887: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:24:02,221: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.33s |
|
INFO: 2024-10-28 13:26:15,188: llmtf.base.daru/treewayextractive: Processing Dataset: 315.74s |
|
INFO: 2024-10-28 13:26:15,188: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-10-28 13:26:15,447: llmtf.base.daru/treewayextractive: {'r-prec': 0.40380281385281386} |
|
INFO: 2024-10-28 13:26:15,501: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:26:15,503: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.604 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.633 |
|
INFO: 2024-10-28 13:26:24,206: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-28 13:26:24,207: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:26:24,207: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:26:48,154: llmtf.base.daru/treewayabstractive: Processing Dataset: 165.93s |
|
INFO: 2024-10-28 13:26:48,154: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-28 13:26:48,155: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3489002151166006, 'rouge2': 0.12404569962254197} |
|
INFO: 2024-10-28 13:26:48,156: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:26:48,157: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.563 0.236 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.633 |
|
INFO: 2024-10-28 13:28:23,832: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 119.62s |
|
INFO: 2024-10-28 13:33:05,781: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 281.95s |
|
INFO: 2024-10-28 13:33:05,781: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-28 13:33:05,844: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.450000 |
|
anatomy 0.725926 |
|
astronomy 0.861842 |
|
business_ethics 0.750000 |
|
clinical_knowledge 0.762264 |
|
college_biology 0.854167 |
|
college_chemistry 0.510000 |
|
college_computer_science 0.720000 |
|
college_mathematics 0.470000 |
|
college_medicine 0.699422 |
|
college_physics 0.509804 |
|
computer_security 0.770000 |
|
conceptual_physics 0.706383 |
|
econometrics 0.605263 |
|
electrical_engineering 0.696552 |
|
elementary_mathematics 0.666667 |
|
formal_logic 0.492063 |
|
global_facts 0.420000 |
|
high_school_biology 0.861290 |
|
high_school_chemistry 0.620690 |
|
high_school_computer_science 0.840000 |
|
high_school_european_history 0.824242 |
|
high_school_geography 0.873737 |
|
high_school_government_and_politics 0.927461 |
|
high_school_macroeconomics 0.761538 |
|
high_school_mathematics 0.566667 |
|
high_school_microeconomics 0.873950 |
|
high_school_physics 0.582781 |
|
high_school_psychology 0.888073 |
|
high_school_statistics 0.708333 |
|
high_school_us_history 0.838235 |
|
high_school_world_history 0.860759 |
|
human_aging 0.762332 |
|
human_sexuality 0.786260 |
|
international_law 0.809917 |
|
jurisprudence 0.796296 |
|
logical_fallacies 0.828221 |
|
machine_learning 0.526786 |
|
management 0.854369 |
|
marketing 0.914530 |
|
medical_genetics 0.810000 |
|
miscellaneous 0.848020 |
|
moral_disputes 0.736994 |
|
moral_scenarios 0.459218 |
|
nutrition 0.797386 |
|
philosophy 0.723473 |
|
prehistory 0.805556 |
|
professional_accounting 0.556738 |
|
professional_law 0.507823 |
|
professional_medicine 0.742647 |
|
professional_psychology 0.750000 |
|
public_relations 0.636364 |
|
security_studies 0.759184 |
|
sociology 0.845771 |
|
us_foreign_policy 0.850000 |
|
virology 0.506024 |
|
world_religions 0.853801 |
|
INFO: 2024-10-28 13:33:05,852: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.662331 |
|
humanities 0.733585 |
|
other (business, health, misc.) 0.724976 |
|
social sciences 0.796467 |
|
INFO: 2024-10-28 13:33:05,857: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7293395108036221} |
|
INFO: 2024-10-28 13:33:05,908: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:33:05,910: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.580 0.236 0.404 0.198 0.780 0.545 0.544 0.826 0.904 0.729 0.633 |
|
INFO: 2024-10-28 13:33:14,562: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-10-28 13:33:14,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077] |
|
INFO: 2024-10-28 13:33:14,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-28 13:33:17,057: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.49s |
|
INFO: 2024-10-28 13:35:21,669: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 124.61s |
|
INFO: 2024-10-28 13:35:21,670: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-28 13:35:21,670: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9953318595732386, 'len': 0.9990656928305265, 'lcs': 1.0} |
|
INFO: 2024-10-28 13:35:21,671: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-28 13:35:21,672: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.618 0.236 0.404 0.198 0.780 0.545 0.544 1.000 0.826 0.904 0.729 0.633 |
|
|