|
INFO: 2024-08-28 09:38:05,098: llmtf.base.evaluator: Starting eval on ['darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'russiannlp/rucola_custom'] |
|
INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:05,099: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:07,960: llmtf.base.darumeru/PARus: Loading Dataset: 2.86s |
|
INFO: 2024-08-28 09:38:07,992: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu', 'daru/treewayextractive'] |
|
INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:07,992: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:09,381: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu', 'nlpcoreteam/enmmlu'] |
|
INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:09,381: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Processing Dataset: 3.52s |
|
INFO: 2024-08-28 09:38:11,476: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-08-28 09:38:11,487: llmtf.base.darumeru/PARus: {'acc': 0.61} |
|
INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:11,487: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:13,100: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:13,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:13,501: llmtf.base.darumeru/RCB: Loading Dataset: 2.01s |
|
INFO: 2024-08-28 09:38:13,809: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/use'] |
|
INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:13,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:15,664: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_para_ru'] |
|
INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:15,664: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:16,575: llmtf.base.darumeru/ruMMLU: Loading Dataset: 8.58s |
|
INFO: 2024-08-28 09:38:17,521: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s |
|
INFO: 2024-08-28 09:38:17,560: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.46s |
|
INFO: 2024-08-28 09:38:17,771: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_en', 'darumeru/cp_para_en'] |
|
INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:17,772: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:18,163: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 2.50s |
|
INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Processing Dataset: 5.47s |
|
INFO: 2024-08-28 09:38:18,968: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-08-28 09:38:18,971: llmtf.base.darumeru/RCB: {'acc': 0.4590909090909091, 'f1_macro': 0.41511023060616065} |
|
INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:18,972: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:38:20,244: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 2.47s |
|
INFO: 2024-08-28 09:38:21,566: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 2.59s |
|
INFO: 2024-08-28 09:38:59,395: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 37.83s |
|
INFO: 2024-08-28 09:38:59,396: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-08-28 09:38:59,405: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7246563573883161, 'f1_macro': 0.7254261079279148} |
|
INFO: 2024-08-28 09:38:59,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:38:59,413: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:39:01,164: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 1.75s |
|
INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.01s |
|
INFO: 2024-08-28 09:39:03,177: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-08-28 09:39:03,179: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8640387481371088} |
|
INFO: 2024-08-28 09:39:03,179: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:39:03,180: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:39:05,009: llmtf.base.darumeru/RWSD: Loading Dataset: 1.83s |
|
INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Processing Dataset: 5.96s |
|
INFO: 2024-08-28 09:39:10,969: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-08-28 09:39:10,970: llmtf.base.darumeru/RWSD: {'acc': 0.5686274509803921} |
|
INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:39:10,971: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:39:14,992: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 4.02s |
|
INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 43.71s |
|
INFO: 2024-08-28 09:39:58,707: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-08-28 09:39:58,716: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7100825260136348, 'mcc': 0.2607217783495962} |
|
INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:39:58,720: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.615 0.610 0.437 0.569 0.725 0.865 0.485 |
|
INFO: 2024-08-28 09:41:10,442: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 170.20s |
|
INFO: 2024-08-28 09:41:10,443: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-08-28 09:41:10,444: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.394786509645572, 'len': 0.9984205596649908, 'lcs': 0.9939024390243902} |
|
INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:41:10,445: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:41:12,462: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.02s |
|
INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 177.42s |
|
INFO: 2024-08-28 09:41:15,580: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-08-28 09:41:15,581: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.705603375484548, 'len': 0.9949612573353719, 'lcs': 0.9404517453798767} |
|
INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:41:15,582: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:41:17,523: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 1.94s |
|
INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Processing Dataset: 143.27s |
|
INFO: 2024-08-28 09:43:35,734: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-08-28 09:43:35,735: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.521261233892721, 'len': 0.9996383092887717, 'lcs': 1.0} |
|
INFO: 2024-08-28 09:43:35,735: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:43:35,736: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.743 0.610 0.437 0.569 1.000 0.998 0.995 0.725 0.865 0.485 |
|
INFO: 2024-08-28 09:43:52,340: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 342.96s |
|
INFO: 2024-08-28 09:43:56,137: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 158.61s |
|
INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-08-28 09:43:56,138: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7865670306157933, 'len': 0.9983390643026695, 'lcs': 0.97} |
|
INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:43:56,139: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.765 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.725 0.865 0.485 |
|
INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Processing Dataset: 355.94s |
|
INFO: 2024-08-28 09:44:12,518: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-08-28 09:44:12,524: llmtf.base.darumeru/ruMMLU: {'acc': 0.5018457547640427} |
|
INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:44:12,564: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:44:25,707: llmtf.base.daru/treewayextractive: Loading Dataset: 13.14s |
|
INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Processing Dataset: 376.40s |
|
INFO: 2024-08-28 09:44:33,920: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-08-28 09:44:33,922: llmtf.base.darumeru/MultiQ: {'f1': 0.2660316706577536, 'em': 0.15487571701720843} |
|
INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:44:33,927: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:44:37,006: llmtf.base.darumeru/USE: Loading Dataset: 3.08s |
|
INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Processing Dataset: 452.34s |
|
INFO: 2024-08-28 09:45:49,905: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-08-28 09:45:49,906: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3550759319426331, 'rouge2': 0.12663323877762525} |
|
INFO: 2024-08-28 09:45:49,909: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:45:49,910: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.662 0.241 0.210 0.610 0.437 0.569 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 |
|
INFO: 2024-08-28 09:48:00,114: llmtf.base.darumeru/USE: Processing Dataset: 203.11s |
|
INFO: 2024-08-28 09:48:00,115: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-08-28 09:48:00,116: llmtf.base.darumeru/USE: {'grade_norm': 0.0931372549019608} |
|
INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:48:00,119: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.622 0.241 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.485 |
|
INFO: 2024-08-28 09:49:59,058: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.72s |
|
INFO: 2024-08-28 09:49:59,059: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-08-28 09:49:59,121: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.270000 |
|
anatomy 0.400000 |
|
astronomy 0.677632 |
|
business_ethics 0.560000 |
|
clinical_knowledge 0.584906 |
|
college_biology 0.506944 |
|
college_chemistry 0.420000 |
|
college_computer_science 0.430000 |
|
college_mathematics 0.310000 |
|
college_medicine 0.549133 |
|
college_physics 0.352941 |
|
computer_security 0.530000 |
|
conceptual_physics 0.493617 |
|
econometrics 0.377193 |
|
electrical_engineering 0.503448 |
|
elementary_mathematics 0.362434 |
|
formal_logic 0.404762 |
|
global_facts 0.310000 |
|
high_school_biology 0.674194 |
|
high_school_chemistry 0.413793 |
|
high_school_computer_science 0.620000 |
|
high_school_european_history 0.709091 |
|
high_school_geography 0.686869 |
|
high_school_government_and_politics 0.616580 |
|
high_school_macroeconomics 0.507692 |
|
high_school_mathematics 0.355556 |
|
high_school_microeconomics 0.516807 |
|
high_school_physics 0.377483 |
|
high_school_psychology 0.684404 |
|
high_school_statistics 0.467593 |
|
high_school_us_history 0.686275 |
|
high_school_world_history 0.725738 |
|
human_aging 0.529148 |
|
human_sexuality 0.610687 |
|
international_law 0.652893 |
|
jurisprudence 0.601852 |
|
logical_fallacies 0.509202 |
|
machine_learning 0.330357 |
|
management 0.669903 |
|
marketing 0.722222 |
|
medical_genetics 0.550000 |
|
miscellaneous 0.627075 |
|
moral_disputes 0.572254 |
|
moral_scenarios 0.218994 |
|
nutrition 0.601307 |
|
philosophy 0.598071 |
|
prehistory 0.533951 |
|
professional_accounting 0.382979 |
|
professional_law 0.355280 |
|
professional_medicine 0.533088 |
|
professional_psychology 0.486928 |
|
public_relations 0.563636 |
|
security_studies 0.616327 |
|
sociology 0.726368 |
|
us_foreign_policy 0.730000 |
|
virology 0.463855 |
|
world_religions 0.678363 |
|
INFO: 2024-08-28 09:49:59,130: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.449777 |
|
humanities 0.557440 |
|
other (business, health, misc.) 0.534544 |
|
social sciences 0.593624 |
|
INFO: 2024-08-28 09:49:59,135: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.53384650825764} |
|
INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [174570] |
|
INFO: 2024-08-28 09:49:59,170: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|eot_id|>'] |
|
INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Processing Dataset: 377.93s |
|
INFO: 2024-08-28 09:50:43,640: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-08-28 09:50:43,882: llmtf.base.daru/treewayextractive: {'r-prec': 0.4306020202020202} |
|
INFO: 2024-08-28 09:50:43,926: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:50:43,927: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.604 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.534 0.485 |
|
INFO: 2024-08-28 09:51:38,740: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 99.57s |
|
INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 320.48s |
|
INFO: 2024-08-28 09:56:59,217: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-08-28 09:56:59,279: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.310000 |
|
anatomy 0.622222 |
|
astronomy 0.723684 |
|
business_ethics 0.600000 |
|
clinical_knowledge 0.720755 |
|
college_biology 0.777778 |
|
college_chemistry 0.460000 |
|
college_computer_science 0.550000 |
|
college_mathematics 0.390000 |
|
college_medicine 0.647399 |
|
college_physics 0.421569 |
|
computer_security 0.780000 |
|
conceptual_physics 0.561702 |
|
econometrics 0.412281 |
|
electrical_engineering 0.572414 |
|
elementary_mathematics 0.447090 |
|
formal_logic 0.523810 |
|
global_facts 0.440000 |
|
high_school_biology 0.803226 |
|
high_school_chemistry 0.512315 |
|
high_school_computer_science 0.710000 |
|
high_school_european_history 0.727273 |
|
high_school_geography 0.772727 |
|
high_school_government_and_politics 0.849741 |
|
high_school_macroeconomics 0.661538 |
|
high_school_mathematics 0.374074 |
|
high_school_microeconomics 0.747899 |
|
high_school_physics 0.417219 |
|
high_school_psychology 0.840367 |
|
high_school_statistics 0.611111 |
|
high_school_us_history 0.789216 |
|
high_school_world_history 0.818565 |
|
human_aging 0.677130 |
|
human_sexuality 0.763359 |
|
international_law 0.743802 |
|
jurisprudence 0.777778 |
|
logical_fallacies 0.773006 |
|
machine_learning 0.464286 |
|
management 0.825243 |
|
marketing 0.854701 |
|
medical_genetics 0.760000 |
|
miscellaneous 0.817369 |
|
moral_disputes 0.664740 |
|
moral_scenarios 0.448045 |
|
nutrition 0.735294 |
|
philosophy 0.678457 |
|
prehistory 0.682099 |
|
professional_accounting 0.524823 |
|
professional_law 0.453716 |
|
professional_medicine 0.764706 |
|
professional_psychology 0.643791 |
|
public_relations 0.654545 |
|
security_studies 0.685714 |
|
sociology 0.830846 |
|
us_foreign_policy 0.820000 |
|
virology 0.500000 |
|
world_religions 0.789474 |
|
INFO: 2024-08-28 09:56:59,286: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.549248 |
|
humanities 0.682306 |
|
other (business, health, misc.) 0.677832 |
|
social sciences 0.723567 |
|
INFO: 2024-08-28 09:56:59,291: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6582382725054904} |
|
INFO: 2024-08-28 09:56:59,323: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-08-28 09:56:59,325: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.607 0.241 0.431 0.210 0.610 0.437 0.569 0.093 1.000 0.970 0.998 0.995 0.502 0.725 0.865 0.658 0.534 0.485 |
|
|