icefog72 leaderboard-pr-bot commited on
Commit
f99de98
1 Parent(s): 7ea0bdf

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (bb70e0be7b4d178d0e63e044b6b9dfc99dcf5eb2)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -118,6 +118,98 @@ model-index:
118
  source:
119
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
120
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  ---
122
  # IceLemonTeaRP-32k-7b
123
 
@@ -234,3 +326,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
234
  |Winogrande (5-shot) |79.72|
235
  |GSM8k (5-shot) |62.40|
236
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  source:
119
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
120
  name: Open LLM Leaderboard
121
+ - task:
122
+ type: text-generation
123
+ name: Text Generation
124
+ dataset:
125
+ name: IFEval (0-Shot)
126
+ type: HuggingFaceH4/ifeval
127
+ args:
128
+ num_few_shot: 0
129
+ metrics:
130
+ - type: inst_level_strict_acc and prompt_level_strict_acc
131
+ value: 52.12
132
+ name: strict accuracy
133
+ source:
134
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
135
+ name: Open LLM Leaderboard
136
+ - task:
137
+ type: text-generation
138
+ name: Text Generation
139
+ dataset:
140
+ name: BBH (3-Shot)
141
+ type: BBH
142
+ args:
143
+ num_few_shot: 3
144
+ metrics:
145
+ - type: acc_norm
146
+ value: 30.14
147
+ name: normalized accuracy
148
+ source:
149
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
150
+ name: Open LLM Leaderboard
151
+ - task:
152
+ type: text-generation
153
+ name: Text Generation
154
+ dataset:
155
+ name: MATH Lvl 5 (4-Shot)
156
+ type: hendrycks/competition_math
157
+ args:
158
+ num_few_shot: 4
159
+ metrics:
160
+ - type: exact_match
161
+ value: 4.83
162
+ name: exact match
163
+ source:
164
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
165
+ name: Open LLM Leaderboard
166
+ - task:
167
+ type: text-generation
168
+ name: Text Generation
169
+ dataset:
170
+ name: GPQA (0-shot)
171
+ type: Idavidrein/gpqa
172
+ args:
173
+ num_few_shot: 0
174
+ metrics:
175
+ - type: acc_norm
176
+ value: 5.37
177
+ name: acc_norm
178
+ source:
179
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
180
+ name: Open LLM Leaderboard
181
+ - task:
182
+ type: text-generation
183
+ name: Text Generation
184
+ dataset:
185
+ name: MuSR (0-shot)
186
+ type: TAUR-Lab/MuSR
187
+ args:
188
+ num_few_shot: 0
189
+ metrics:
190
+ - type: acc_norm
191
+ value: 12.2
192
+ name: acc_norm
193
+ source:
194
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
195
+ name: Open LLM Leaderboard
196
+ - task:
197
+ type: text-generation
198
+ name: Text Generation
199
+ dataset:
200
+ name: MMLU-PRO (5-shot)
201
+ type: TIGER-Lab/MMLU-Pro
202
+ config: main
203
+ split: test
204
+ args:
205
+ num_few_shot: 5
206
+ metrics:
207
+ - type: acc
208
+ value: 22.97
209
+ name: accuracy
210
+ source:
211
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=icefog72/IceLemonTeaRP-32k-7b
212
+ name: Open LLM Leaderboard
213
  ---
214
  # IceLemonTeaRP-32k-7b
215
 
 
326
  |Winogrande (5-shot) |79.72|
327
  |GSM8k (5-shot) |62.40|
328
 
329
+
330
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
331
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_icefog72__IceLemonTeaRP-32k-7b)
332
+
333
+ | Metric |Value|
334
+ |-------------------|----:|
335
+ |Avg. |21.27|
336
+ |IFEval (0-Shot) |52.12|
337
+ |BBH (3-Shot) |30.14|
338
+ |MATH Lvl 5 (4-Shot)| 4.83|
339
+ |GPQA (0-shot) | 5.37|
340
+ |MuSR (0-shot) |12.20|
341
+ |MMLU-PRO (5-shot) |22.97|
342
+