File size: 5,758 Bytes
26d4600
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.2578|±  |0.0037|
| - humanities                          |N/A    |none  |     0|acc   |0.2465|±  |0.0063|
|  - formal_logic                       |      0|none  |     0|acc   |0.2857|±  |0.0404|
|  - high_school_european_history       |      0|none  |     0|acc   |0.2424|±  |0.0335|
|  - high_school_us_history             |      0|none  |     0|acc   |0.2255|±  |0.0293|
|  - high_school_world_history          |      0|none  |     0|acc   |0.2489|±  |0.0281|
|  - international_law                  |      0|none  |     0|acc   |0.1983|±  |0.0364|
|  - jurisprudence                      |      0|none  |     0|acc   |0.2685|±  |0.0428|
|  - logical_fallacies                  |      0|none  |     0|acc   |0.2515|±  |0.0341|
|  - moral_disputes                     |      0|none  |     0|acc   |0.2630|±  |0.0237|
|  - moral_scenarios                    |      0|none  |     0|acc   |0.2335|±  |0.0141|
|  - philosophy                         |      0|none  |     0|acc   |0.2122|±  |0.0232|
|  - prehistory                         |      0|none  |     0|acc   |0.2377|±  |0.0237|
|  - professional_law                   |      0|none  |     0|acc   |0.2640|±  |0.0113|
|  - world_religions                    |      0|none  |     0|acc   |0.2164|±  |0.0316|
| - other                               |N/A    |none  |     0|acc   |0.2617|±  |0.0079|
|  - business_ethics                    |      0|none  |     0|acc   |0.2300|±  |0.0423|
|  - clinical_knowledge                 |      0|none  |     0|acc   |0.3019|±  |0.0283|
|  - college_medicine                   |      0|none  |     0|acc   |0.3179|±  |0.0355|
|  - global_facts                       |      0|none  |     0|acc   |0.1800|±  |0.0386|
|  - human_aging                        |      0|none  |     0|acc   |0.1794|±  |0.0257|
|  - management                         |      0|none  |     0|acc   |0.3107|±  |0.0458|
|  - marketing                          |      0|none  |     0|acc   |0.2863|±  |0.0296|
|  - medical_genetics                   |      0|none  |     0|acc   |0.2900|±  |0.0456|
|  - miscellaneous                      |      0|none  |     0|acc   |0.2401|±  |0.0153|
|  - nutrition                          |      0|none  |     0|acc   |0.2843|±  |0.0258|
|  - professional_accounting            |      0|none  |     0|acc   |0.2872|±  |0.0270|
|  - professional_medicine              |      0|none  |     0|acc   |0.2831|±  |0.0274|
|  - virology                           |      0|none  |     0|acc   |0.2169|±  |0.0321|
| - social_sciences                     |N/A    |none  |     0|acc   |0.2720|±  |0.0080|
|  - econometrics                       |      0|none  |     0|acc   |0.2193|±  |0.0389|
|  - high_school_geography              |      0|none  |     0|acc   |0.3030|±  |0.0327|
|  - high_school_government_and_politics|      0|none  |     0|acc   |0.2798|±  |0.0324|
|  - high_school_macroeconomics         |      0|none  |     0|acc   |0.3385|±  |0.0240|
|  - high_school_microeconomics         |      0|none  |     0|acc   |0.2983|±  |0.0297|
|  - high_school_psychology             |      0|none  |     0|acc   |0.2807|±  |0.0193|
|  - human_sexuality                    |      0|none  |     0|acc   |0.2595|±  |0.0384|
|  - professional_psychology            |      0|none  |     0|acc   |0.2288|±  |0.0170|
|  - public_relations                   |      0|none  |     0|acc   |0.2273|±  |0.0401|
|  - security_studies                   |      0|none  |     0|acc   |0.2571|±  |0.0280|
|  - sociology                          |      0|none  |     0|acc   |0.2388|±  |0.0301|
|  - us_foreign_policy                  |      0|none  |     0|acc   |0.3200|±  |0.0469|
| - stem                                |N/A    |none  |     0|acc   |0.2569|±  |0.0078|
|  - abstract_algebra                   |      0|none  |     0|acc   |0.1900|±  |0.0394|
|  - anatomy                            |      0|none  |     0|acc   |0.2444|±  |0.0371|
|  - astronomy                          |      0|none  |     0|acc   |0.2105|±  |0.0332|
|  - college_biology                    |      0|none  |     0|acc   |0.2917|±  |0.0380|
|  - college_chemistry                  |      0|none  |     0|acc   |0.3200|±  |0.0469|
|  - college_computer_science           |      0|none  |     0|acc   |0.2900|±  |0.0456|
|  - college_mathematics                |      0|none  |     0|acc   |0.2400|±  |0.0429|
|  - college_physics                    |      0|none  |     0|acc   |0.2549|±  |0.0434|
|  - computer_security                  |      0|none  |     0|acc   |0.2700|±  |0.0446|
|  - conceptual_physics                 |      0|none  |     0|acc   |0.2766|±  |0.0292|
|  - electrical_engineering             |      0|none  |     0|acc   |0.2414|±  |0.0357|
|  - elementary_mathematics             |      0|none  |     0|acc   |0.2619|±  |0.0226|
|  - high_school_biology                |      0|none  |     0|acc   |0.2710|±  |0.0253|
|  - high_school_chemistry              |      0|none  |     0|acc   |0.2167|±  |0.0290|
|  - high_school_computer_science       |      0|none  |     0|acc   |0.2100|±  |0.0409|
|  - high_school_mathematics            |      0|none  |     0|acc   |0.2407|±  |0.0261|
|  - high_school_physics                |      0|none  |     0|acc   |0.2715|±  |0.0363|
|  - high_school_statistics             |      0|none  |     0|acc   |0.2731|±  |0.0304|
|  - machine_learning                   |      0|none  |     0|acc   |0.2946|±  |0.0433|