alexmarques
commited on
Commit
•
55c55b9
1
Parent(s):
b1f228e
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ pipeline_tag: text-generation
|
|
19 |
- **Model Developers:** Neural Magic
|
20 |
|
21 |
Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
|
22 |
-
It achieves an average score of 73.
|
23 |
|
24 |
### Model Optimizations
|
25 |
|
@@ -181,71 +181,71 @@ lm_eval \
|
|
181 |
<tr>
|
182 |
<td>MMLU (5-shot)
|
183 |
</td>
|
184 |
-
<td>
|
185 |
</td>
|
186 |
-
<td>76.
|
187 |
</td>
|
188 |
-
<td>
|
189 |
</td>
|
190 |
</tr>
|
191 |
<tr>
|
192 |
<td>ARC Challenge (25-shot)
|
193 |
</td>
|
194 |
-
<td>
|
195 |
</td>
|
196 |
-
<td>69.
|
197 |
</td>
|
198 |
-
<td>
|
199 |
</td>
|
200 |
</tr>
|
201 |
<tr>
|
202 |
<td>GSM-8K (5-shot, strict-match)
|
203 |
</td>
|
204 |
-
<td>
|
205 |
</td>
|
206 |
-
<td>
|
207 |
</td>
|
208 |
-
<td>
|
209 |
</td>
|
210 |
</tr>
|
211 |
<tr>
|
212 |
<td>Hellaswag (10-shot)
|
213 |
</td>
|
214 |
-
<td>
|
215 |
</td>
|
216 |
<td>84.96
|
217 |
</td>
|
218 |
-
<td>
|
219 |
</td>
|
220 |
</tr>
|
221 |
<tr>
|
222 |
<td>Winogrande (5-shot)
|
223 |
</td>
|
224 |
-
<td>
|
225 |
</td>
|
226 |
-
<td>73.
|
227 |
</td>
|
228 |
-
<td>
|
229 |
</td>
|
230 |
</tr>
|
231 |
<tr>
|
232 |
<td>TruthfulQA (0-shot)
|
233 |
</td>
|
234 |
-
<td>54
|
235 |
</td>
|
236 |
-
<td>54.
|
237 |
</td>
|
238 |
-
<td>
|
239 |
</td>
|
240 |
</tr>
|
241 |
<tr>
|
242 |
<td><strong>Average</strong>
|
243 |
</td>
|
244 |
-
<td><strong>73.
|
245 |
</td>
|
246 |
-
<td><strong>
|
247 |
</td>
|
248 |
-
<td><strong>100.
|
249 |
</td>
|
250 |
</tr>
|
251 |
</table>
|
|
|
19 |
- **Model Developers:** Neural Magic
|
20 |
|
21 |
Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
|
22 |
+
It achieves an average score of 73.99 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.32.
|
23 |
|
24 |
### Model Optimizations
|
25 |
|
|
|
181 |
<tr>
|
182 |
<td>MMLU (5-shot)
|
183 |
</td>
|
184 |
+
<td>75.64
|
185 |
</td>
|
186 |
+
<td>76.79
|
187 |
</td>
|
188 |
+
<td>101.5%
|
189 |
</td>
|
190 |
</tr>
|
191 |
<tr>
|
192 |
<td>ARC Challenge (25-shot)
|
193 |
</td>
|
194 |
+
<td>67.58
|
195 |
</td>
|
196 |
+
<td>69.54
|
197 |
</td>
|
198 |
+
<td>102.9%
|
199 |
</td>
|
200 |
</tr>
|
201 |
<tr>
|
202 |
<td>GSM-8K (5-shot, strict-match)
|
203 |
</td>
|
204 |
+
<td>83.32
|
205 |
</td>
|
206 |
+
<td>84.31
|
207 |
</td>
|
208 |
+
<td>101.2%
|
209 |
</td>
|
210 |
</tr>
|
211 |
<tr>
|
212 |
<td>Hellaswag (10-shot)
|
213 |
</td>
|
214 |
+
<td>84.37
|
215 |
</td>
|
216 |
<td>84.96
|
217 |
</td>
|
218 |
+
<td>100.7%
|
219 |
</td>
|
220 |
</tr>
|
221 |
<tr>
|
222 |
<td>Winogrande (5-shot)
|
223 |
</td>
|
224 |
+
<td>75.45
|
225 |
</td>
|
226 |
+
<td>73.64
|
227 |
</td>
|
228 |
+
<td>97.6%
|
229 |
</td>
|
230 |
</tr>
|
231 |
<tr>
|
232 |
<td>TruthfulQA (0-shot)
|
233 |
</td>
|
234 |
+
<td>53.54
|
235 |
</td>
|
236 |
+
<td>54.73
|
237 |
</td>
|
238 |
+
<td>102.2%
|
239 |
</td>
|
240 |
</tr>
|
241 |
<tr>
|
242 |
<td><strong>Average</strong>
|
243 |
</td>
|
244 |
+
<td><strong>73.32</strong>
|
245 |
</td>
|
246 |
+
<td><strong>73.99</strong>
|
247 |
</td>
|
248 |
+
<td><strong>100.9%</strong>
|
249 |
</td>
|
250 |
</tr>
|
251 |
</table>
|