alexmarques commited on
Commit
55c55b9
1 Parent(s): b1f228e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -19,7 +19,7 @@ pipeline_tag: text-generation
19
  - **Model Developers:** Neural Magic
20
 
21
  Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
22
- It achieves an average score of 73.87 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.07.
23
 
24
  ### Model Optimizations
25
 
@@ -181,71 +181,71 @@ lm_eval \
181
  <tr>
182
  <td>MMLU (5-shot)
183
  </td>
184
- <td>76.61
185
  </td>
186
- <td>76.68
187
  </td>
188
- <td>100.1%
189
  </td>
190
  </tr>
191
  <tr>
192
  <td>ARC Challenge (25-shot)
193
  </td>
194
- <td>69.37
195
  </td>
196
- <td>69.71
197
  </td>
198
- <td>100.5%
199
  </td>
200
  </tr>
201
  <tr>
202
  <td>GSM-8K (5-shot, strict-match)
203
  </td>
204
- <td>84.31
205
  </td>
206
- <td>85.06
207
  </td>
208
- <td>100.0%
209
  </td>
210
  </tr>
211
  <tr>
212
  <td>Hellaswag (10-shot)
213
  </td>
214
- <td>85.06
215
  </td>
216
  <td>84.96
217
  </td>
218
- <td>99.9%
219
  </td>
220
  </tr>
221
  <tr>
222
  <td>Winogrande (5-shot)
223
  </td>
224
- <td>73.32
225
  </td>
226
- <td>73.32
227
  </td>
228
- <td>100.0%
229
  </td>
230
  </tr>
231
  <tr>
232
  <td>TruthfulQA (0-shot)
233
  </td>
234
- <td>54.57
235
  </td>
236
- <td>54.68
237
  </td>
238
- <td>100.2%
239
  </td>
240
  </tr>
241
  <tr>
242
  <td><strong>Average</strong>
243
  </td>
244
- <td><strong>73.87</strong>
245
  </td>
246
- <td><strong>74.07</strong>
247
  </td>
248
- <td><strong>100.3%</strong>
249
  </td>
250
  </tr>
251
  </table>
 
19
  - **Model Developers:** Neural Magic
20
 
21
  Quantized version of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), a 14 billion-parameter open model trained using the Phi-3 datasets.
22
+ It achieves an average score of 73.99 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.32.
23
 
24
  ### Model Optimizations
25
 
 
181
  <tr>
182
  <td>MMLU (5-shot)
183
  </td>
184
+ <td>75.64
185
  </td>
186
+ <td>76.79
187
  </td>
188
+ <td>101.5%
189
  </td>
190
  </tr>
191
  <tr>
192
  <td>ARC Challenge (25-shot)
193
  </td>
194
+ <td>67.58
195
  </td>
196
+ <td>69.54
197
  </td>
198
+ <td>102.9%
199
  </td>
200
  </tr>
201
  <tr>
202
  <td>GSM-8K (5-shot, strict-match)
203
  </td>
204
+ <td>83.32
205
  </td>
206
+ <td>84.31
207
  </td>
208
+ <td>101.2%
209
  </td>
210
  </tr>
211
  <tr>
212
  <td>Hellaswag (10-shot)
213
  </td>
214
+ <td>84.37
215
  </td>
216
  <td>84.96
217
  </td>
218
+ <td>100.7%
219
  </td>
220
  </tr>
221
  <tr>
222
  <td>Winogrande (5-shot)
223
  </td>
224
+ <td>75.45
225
  </td>
226
+ <td>73.64
227
  </td>
228
+ <td>97.6%
229
  </td>
230
  </tr>
231
  <tr>
232
  <td>TruthfulQA (0-shot)
233
  </td>
234
+ <td>53.54
235
  </td>
236
+ <td>54.73
237
  </td>
238
+ <td>102.2%
239
  </td>
240
  </tr>
241
  <tr>
242
  <td><strong>Average</strong>
243
  </td>
244
+ <td><strong>73.32</strong>
245
  </td>
246
+ <td><strong>73.99</strong>
247
  </td>
248
+ <td><strong>100.9%</strong>
249
  </td>
250
  </tr>
251
  </table>