File size: 5,758 Bytes
26d4600
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.4082|±  |0.0041|
| - humanities                          |N/A    |none  |     0|acc   |0.3783|±  |0.0069|
|  - formal_logic                       |      0|none  |     0|acc   |0.3333|±  |0.0422|
|  - high_school_european_history       |      0|none  |     0|acc   |0.5818|±  |0.0385|
|  - high_school_us_history             |      0|none  |     0|acc   |0.4804|±  |0.0351|
|  - high_school_world_history          |      0|none  |     0|acc   |0.5781|±  |0.0321|
|  - international_law                  |      0|none  |     0|acc   |0.5372|±  |0.0455|
|  - jurisprudence                      |      0|none  |     0|acc   |0.4352|±  |0.0479|
|  - logical_fallacies                  |      0|none  |     0|acc   |0.4785|±  |0.0392|
|  - moral_disputes                     |      0|none  |     0|acc   |0.4017|±  |0.0264|
|  - moral_scenarios                    |      0|none  |     0|acc   |0.2413|±  |0.0143|
|  - philosophy                         |      0|none  |     0|acc   |0.4952|±  |0.0284|
|  - prehistory                         |      0|none  |     0|acc   |0.4321|±  |0.0276|
|  - professional_law                   |      0|none  |     0|acc   |0.3031|±  |0.0117|
|  - world_religions                    |      0|none  |     0|acc   |0.6023|±  |0.0375|
| - other                               |N/A    |none  |     0|acc   |0.4606|±  |0.0089|
|  - business_ethics                    |      0|none  |     0|acc   |0.4700|±  |0.0502|
|  - clinical_knowledge                 |      0|none  |     0|acc   |0.4717|±  |0.0307|
|  - college_medicine                   |      0|none  |     0|acc   |0.4509|±  |0.0379|
|  - global_facts                       |      0|none  |     0|acc   |0.3000|±  |0.0461|
|  - human_aging                        |      0|none  |     0|acc   |0.3946|±  |0.0328|
|  - management                         |      0|none  |     0|acc   |0.4369|±  |0.0491|
|  - marketing                          |      0|none  |     0|acc   |0.5940|±  |0.0322|
|  - medical_genetics                   |      0|none  |     0|acc   |0.5000|±  |0.0503|
|  - miscellaneous                      |      0|none  |     0|acc   |0.5223|±  |0.0179|
|  - nutrition                          |      0|none  |     0|acc   |0.4444|±  |0.0285|
|  - professional_accounting            |      0|none  |     0|acc   |0.3794|±  |0.0289|
|  - professional_medicine              |      0|none  |     0|acc   |0.4228|±  |0.0300|
|  - virology                           |      0|none  |     0|acc   |0.3735|±  |0.0377|
| - social_sciences                     |N/A    |none  |     0|acc   |0.4621|±  |0.0089|
|  - econometrics                       |      0|none  |     0|acc   |0.2105|±  |0.0384|
|  - high_school_geography              |      0|none  |     0|acc   |0.4545|±  |0.0355|
|  - high_school_government_and_politics|      0|none  |     0|acc   |0.5440|±  |0.0359|
|  - high_school_macroeconomics         |      0|none  |     0|acc   |0.3872|±  |0.0247|
|  - high_school_microeconomics         |      0|none  |     0|acc   |0.3529|±  |0.0310|
|  - high_school_psychology             |      0|none  |     0|acc   |0.5468|±  |0.0213|
|  - human_sexuality                    |      0|none  |     0|acc   |0.5115|±  |0.0438|
|  - professional_psychology            |      0|none  |     0|acc   |0.4297|±  |0.0200|
|  - public_relations                   |      0|none  |     0|acc   |0.4273|±  |0.0474|
|  - security_studies                   |      0|none  |     0|acc   |0.4571|±  |0.0319|
|  - sociology                          |      0|none  |     0|acc   |0.5920|±  |0.0348|
|  - us_foreign_policy                  |      0|none  |     0|acc   |0.6200|±  |0.0488|
| - stem                                |N/A    |none  |     0|acc   |0.3486|±  |0.0084|
|  - abstract_algebra                   |      0|none  |     0|acc   |0.2800|±  |0.0451|
|  - anatomy                            |      0|none  |     0|acc   |0.4148|±  |0.0426|
|  - astronomy                          |      0|none  |     0|acc   |0.4079|±  |0.0400|
|  - college_biology                    |      0|none  |     0|acc   |0.4583|±  |0.0417|
|  - college_chemistry                  |      0|none  |     0|acc   |0.3500|±  |0.0479|
|  - college_computer_science           |      0|none  |     0|acc   |0.3200|±  |0.0469|
|  - college_mathematics                |      0|none  |     0|acc   |0.3100|±  |0.0465|
|  - college_physics                    |      0|none  |     0|acc   |0.3137|±  |0.0462|
|  - computer_security                  |      0|none  |     0|acc   |0.5200|±  |0.0502|
|  - conceptual_physics                 |      0|none  |     0|acc   |0.3489|±  |0.0312|
|  - electrical_engineering             |      0|none  |     0|acc   |0.3655|±  |0.0401|
|  - elementary_mathematics             |      0|none  |     0|acc   |0.2725|±  |0.0229|
|  - high_school_biology                |      0|none  |     0|acc   |0.4581|±  |0.0283|
|  - high_school_chemistry              |      0|none  |     0|acc   |0.3399|±  |0.0333|
|  - high_school_computer_science       |      0|none  |     0|acc   |0.4100|±  |0.0494|
|  - high_school_mathematics            |      0|none  |     0|acc   |0.2407|±  |0.0261|
|  - high_school_physics                |      0|none  |     0|acc   |0.2914|±  |0.0371|
|  - high_school_statistics             |      0|none  |     0|acc   |0.2870|±  |0.0309|
|  - machine_learning                   |      0|none  |     0|acc   |0.3929|±  |0.0464|