leaderboard-pr-bot commited on
Commit
0136eef
1 Parent(s): ef9d6fd

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  license: other
3
  tags:
4
  - axolotl
@@ -54,8 +56,109 @@ datasets:
54
  - HuggingFaceH4/no_robots
55
  - OpenAssistant/oasst_top1_2023-08-25
56
  - WizardLM/WizardLM_evol_instruct_70k
57
- language:
58
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ---
60
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/CxDk4KKhQqL-Pg0AMn1gb.png)
61
 
@@ -291,4 +394,17 @@ Thanks to all open source AI community.
291
 
292
  If you would like to support me:
293
 
294
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: other
5
  tags:
6
  - axolotl
 
56
  - HuggingFaceH4/no_robots
57
  - OpenAssistant/oasst_top1_2023-08-25
58
  - WizardLM/WizardLM_evol_instruct_70k
59
+ model-index:
60
+ - name: Einstein-v6-7B
61
+ results:
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: AI2 Reasoning Challenge (25-Shot)
67
+ type: ai2_arc
68
+ config: ARC-Challenge
69
+ split: test
70
+ args:
71
+ num_few_shot: 25
72
+ metrics:
73
+ - type: acc_norm
74
+ value: 61.52
75
+ name: normalized accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: HellaSwag (10-Shot)
84
+ type: hellaswag
85
+ split: validation
86
+ args:
87
+ num_few_shot: 10
88
+ metrics:
89
+ - type: acc_norm
90
+ value: 80.91
91
+ name: normalized accuracy
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: MMLU (5-Shot)
100
+ type: cais/mmlu
101
+ config: all
102
+ split: test
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 62.02
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: TruthfulQA (0-shot)
117
+ type: truthful_qa
118
+ config: multiple_choice
119
+ split: validation
120
+ args:
121
+ num_few_shot: 0
122
+ metrics:
123
+ - type: mc2
124
+ value: 51.24
125
+ source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
127
+ name: Open LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: Winogrande (5-shot)
133
+ type: winogrande
134
+ config: winogrande_xl
135
+ split: validation
136
+ args:
137
+ num_few_shot: 5
138
+ metrics:
139
+ - type: acc
140
+ value: 78.53
141
+ name: accuracy
142
+ source:
143
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
144
+ name: Open LLM Leaderboard
145
+ - task:
146
+ type: text-generation
147
+ name: Text Generation
148
+ dataset:
149
+ name: GSM8k (5-shot)
150
+ type: gsm8k
151
+ config: main
152
+ split: test
153
+ args:
154
+ num_few_shot: 5
155
+ metrics:
156
+ - type: acc
157
+ value: 58.61
158
+ name: accuracy
159
+ source:
160
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6-7B
161
+ name: Open LLM Leaderboard
162
  ---
163
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/CxDk4KKhQqL-Pg0AMn1gb.png)
164
 
 
394
 
395
  If you would like to support me:
396
 
397
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
398
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
399
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v6-7B)
400
+
401
+ | Metric |Value|
402
+ |---------------------------------|----:|
403
+ |Avg. |65.47|
404
+ |AI2 Reasoning Challenge (25-Shot)|61.52|
405
+ |HellaSwag (10-Shot) |80.91|
406
+ |MMLU (5-Shot) |62.02|
407
+ |TruthfulQA (0-shot) |51.24|
408
+ |Winogrande (5-shot) |78.53|
409
+ |GSM8k (5-shot) |58.61|
410
+