RefalMachine's picture
Upload folder using huggingface_hub
c9cc895 verified
raw
history blame
16.6 kB
INFO: 2024-10-26 10:00:21,600: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:21,601: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:23,554: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:23,554: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:25,061: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:25,062: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:25,825: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.22s
INFO: 2024-10-26 10:00:25,839: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s
INFO: 2024-10-26 10:00:27,118: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:27,118: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:27,739: llmtf.base.darumeru/RCB: Loading Dataset: 2.68s
INFO: 2024-10-26 10:00:29,086: llmtf.base.darumeru/PARus: Processing Dataset: 3.25s
INFO: 2024-10-26 10:00:29,088: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-26 10:00:29,102: llmtf.base.darumeru/PARus: {'acc': 0.8}
INFO: 2024-10-26 10:00:29,102: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:00:29,105: llmtf.base.evaluator:
mean darumeru/PARus
0.800 0.800
INFO: 2024-10-26 10:00:30,121: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:30,122: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:30,879: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.76s
INFO: 2024-10-26 10:00:31,330: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:31,330: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:32,861: llmtf.base.darumeru/RCB: Processing Dataset: 5.12s
INFO: 2024-10-26 10:00:32,862: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-26 10:00:32,870: llmtf.base.darumeru/RCB: {'acc': 0.5863636363636363, 'f1_macro': 0.520344156087331}
INFO: 2024-10-26 10:00:32,871: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:00:32,874: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB
0.677 0.800 0.553
INFO: 2024-10-26 10:00:33,148: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.03s
INFO: 2024-10-26 10:00:33,497: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:33,498: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:34,009: llmtf.base.darumeru/RWSD: Loading Dataset: 2.68s
INFO: 2024-10-26 10:00:35,785: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.64s
INFO: 2024-10-26 10:00:35,787: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-26 10:00:35,795: llmtf.base.darumeru/ruWorldTree: {'acc': 0.9047619047619048, 'f1_macro': 0.9048601269315972}
INFO: 2024-10-26 10:00:35,795: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:00:35,799: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/ruWorldTree
0.753 0.800 0.553 0.905
INFO: 2024-10-26 10:00:36,096: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:36,096: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:39,654: llmtf.base.darumeru/RWSD: Processing Dataset: 5.64s
INFO: 2024-10-26 10:00:39,655: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-26 10:00:39,660: llmtf.base.darumeru/RWSD: {'acc': 0.5343137254901961}
INFO: 2024-10-26 10:00:39,660: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:00:39,664: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruWorldTree
0.698 0.800 0.553 0.534 0.905
INFO: 2024-10-26 10:00:40,963: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:40,963: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:43,550: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:00:43,550: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:00:44,771: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.81s
INFO: 2024-10-26 10:00:46,698: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.15s
INFO: 2024-10-26 10:00:46,725: llmtf.base.daru/treewayextractive: Loading Dataset: 13.23s
INFO: 2024-10-26 10:01:01,719: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 30.84s
INFO: 2024-10-26 10:01:01,720: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-26 10:01:01,734: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8191580756013745, 'f1_macro': 0.8196610608491144}
INFO: 2024-10-26 10:01:01,743: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:01:01,747: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
0.722 0.800 0.553 0.534 0.819 0.905
INFO: 2024-10-26 10:02:41,817: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 115.10s
INFO: 2024-10-26 10:02:41,820: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-26 10:02:41,823: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9920472926206347, 'len': 0.9992233172309354, 'lcs': 1.0}
INFO: 2024-10-26 10:02:41,824: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:02:41,830: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 125.73s
INFO: 2024-10-26 10:02:41,835: llmtf.base.evaluator:
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree
0.769 0.800 0.553 0.534 1.000 0.819 0.905
INFO: 2024-10-26 10:03:13,156: llmtf.base.daru/treewayabstractive: Processing Dataset: 148.37s
INFO: 2024-10-26 10:03:13,158: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-26 10:03:13,162: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3486928379990829, 'rouge2': 0.12579847916639003}
INFO: 2024-10-26 10:03:13,164: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:03:13,169: llmtf.base.evaluator:
mean daru/treewayabstractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree
0.693 0.237 0.800 0.553 0.534 1.000 0.819 0.905
INFO: 2024-10-26 10:04:19,979: llmtf.base.darumeru/MultiQ: Processing Dataset: 234.15s
INFO: 2024-10-26 10:04:19,982: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-26 10:04:19,986: llmtf.base.darumeru/MultiQ: {'f1': 0.28476692977698215, 'em': 0.17304015296367112}
INFO: 2024-10-26 10:04:19,991: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:04:19,997: llmtf.base.evaluator:
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree
0.635 0.237 0.229 0.800 0.553 0.534 1.000 0.819 0.905
INFO: 2024-10-26 10:04:32,092: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147077]
INFO: 2024-10-26 10:04:32,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-26 10:05:58,778: llmtf.base.daru/treewayextractive: Processing Dataset: 312.04s
INFO: 2024-10-26 10:05:58,782: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-26 10:05:59,016: llmtf.base.daru/treewayextractive: {'r-prec': 0.3931765512265512}
INFO: 2024-10-26 10:05:59,058: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:05:59,066: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree
0.608 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905
INFO: 2024-10-26 10:06:34,248: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 122.16s
INFO: 2024-10-26 10:07:49,582: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 307.75s
INFO: 2024-10-26 10:07:49,584: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-26 10:07:49,630: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.430000
anatomy 0.577778
astronomy 0.743421
business_ethics 0.670000
clinical_knowledge 0.701887
college_biology 0.687500
college_chemistry 0.470000
college_computer_science 0.640000
college_mathematics 0.470000
college_medicine 0.589595
college_physics 0.490196
computer_security 0.710000
conceptual_physics 0.668085
econometrics 0.464912
electrical_engineering 0.579310
elementary_mathematics 0.624339
formal_logic 0.420635
global_facts 0.400000
high_school_biology 0.806452
high_school_chemistry 0.536946
high_school_computer_science 0.790000
high_school_european_history 0.763636
high_school_geography 0.777778
high_school_government_and_politics 0.715026
high_school_macroeconomics 0.653846
high_school_mathematics 0.462963
high_school_microeconomics 0.714286
high_school_physics 0.490066
high_school_psychology 0.796330
high_school_statistics 0.625000
high_school_us_history 0.754902
high_school_world_history 0.776371
human_aging 0.618834
human_sexuality 0.717557
international_law 0.702479
jurisprudence 0.685185
logical_fallacies 0.613497
machine_learning 0.446429
management 0.737864
marketing 0.799145
medical_genetics 0.650000
miscellaneous 0.717752
moral_disputes 0.604046
moral_scenarios 0.242458
nutrition 0.705882
philosophy 0.639871
prehistory 0.626543
professional_accounting 0.446809
professional_law 0.399609
professional_medicine 0.595588
professional_psychology 0.601307
public_relations 0.600000
security_studies 0.673469
sociology 0.711443
us_foreign_policy 0.800000
virology 0.500000
world_religions 0.730994
INFO: 2024-10-26 10:07:49,638: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.592817
humanities 0.612325
other (business, health, misc.) 0.622224
social sciences 0.685496
INFO: 2024-10-26 10:07:49,666: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6282155966274303}
INFO: 2024-10-26 10:07:49,702: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:07:49,714: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU
0.610 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.628
INFO: 2024-10-26 10:11:09,714: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 275.46s
INFO: 2024-10-26 10:11:09,718: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-26 10:11:09,763: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.510000
anatomy 0.703704
astronomy 0.848684
business_ethics 0.770000
clinical_knowledge 0.777358
college_biology 0.847222
college_chemistry 0.550000
college_computer_science 0.710000
college_mathematics 0.470000
college_medicine 0.693642
college_physics 0.519608
computer_security 0.780000
conceptual_physics 0.706383
econometrics 0.596491
electrical_engineering 0.668966
elementary_mathematics 0.666667
formal_logic 0.484127
global_facts 0.450000
high_school_biology 0.867742
high_school_chemistry 0.630542
high_school_computer_science 0.860000
high_school_european_history 0.800000
high_school_geography 0.878788
high_school_government_and_politics 0.943005
high_school_macroeconomics 0.761538
high_school_mathematics 0.551852
high_school_microeconomics 0.865546
high_school_physics 0.582781
high_school_psychology 0.882569
high_school_statistics 0.717593
high_school_us_history 0.848039
high_school_world_history 0.848101
human_aging 0.784753
human_sexuality 0.748092
international_law 0.785124
jurisprudence 0.787037
logical_fallacies 0.834356
machine_learning 0.535714
management 0.864078
marketing 0.901709
medical_genetics 0.790000
miscellaneous 0.846743
moral_disputes 0.731214
moral_scenarios 0.401117
nutrition 0.790850
philosophy 0.729904
prehistory 0.793210
professional_accounting 0.570922
professional_law 0.507171
professional_medicine 0.764706
professional_psychology 0.745098
public_relations 0.700000
security_studies 0.759184
sociology 0.845771
us_foreign_policy 0.880000
virology 0.518072
world_religions 0.847953
INFO: 2024-10-26 10:11:09,771: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.667986
humanities 0.722873
other (business, health, misc.) 0.730467
social sciences 0.800507
INFO: 2024-10-26 10:11:09,779: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7304583527816666}
INFO: 2024-10-26 10:11:09,811: llmtf.base.evaluator: Ended eval
INFO: 2024-10-26 10:11:09,819: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.621 0.237 0.393 0.229 0.800 0.553 0.534 1.000 0.819 0.905 0.730 0.628