alexmarques commited on
Commit
c1fdfad
1 Parent(s): 1d11c7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -13,19 +13,19 @@ license: llama2
13
  - **Output:** Text
14
  - **Model Optimizations:**
15
  - **Weight quantization:** INT4
16
- - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct), this models is intended for assistant-like chat.
17
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
18
  - **Release Date:** 7/11/2024
19
  - **Version:** 1.0
20
  - **License(s)**: [MIT](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE)
21
  - **Model Developers:** Neural Magic
22
 
23
- Quantized version of [Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct).
24
  It achieves an average score of 72.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.46.
25
 
26
  ### Model Optimizations
27
 
28
- This model was obtained by quantizing the weights of [Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) to INT4 data type.
29
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
30
 
31
  Only the weights of the linear operators within transformers blocks are quantized. Symmetric group-wise quantization is applied, in which a linear scaling per group maps the INT4 and floating point representations of the quantized weights.
@@ -120,7 +120,7 @@ from llmcompressor.modifiers.quantization import GPTQModifier
120
  from datasets import load_dataset
121
  import random
122
 
123
- model_id = "microsoft/Phi-3-medium-4k-instruct"
124
 
125
  num_samples = 512
126
  max_seq_len = 4096
@@ -184,7 +184,7 @@ lm_eval \
184
  <tr>
185
  <td><strong>Benchmark</strong>
186
  </td>
187
- <td><strong>Phi-3-medium-4k-instruct </strong>
188
  </td>
189
  <td><strong>Phi-3-medium-128k-instruct-quantized.w4a16(this model)</strong>
190
  </td>
 
13
  - **Output:** Text
14
  - **Model Optimizations:**
15
  - **Weight quantization:** INT4
16
+ - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), this models is intended for assistant-like chat.
17
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
18
  - **Release Date:** 7/11/2024
19
  - **Version:** 1.0
20
  - **License(s)**: [MIT](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE)
21
  - **Model Developers:** Neural Magic
22
 
23
+ Quantized version of [Phi-3-medium-128-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct).
24
  It achieves an average score of 72.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.46.
25
 
26
  ### Model Optimizations
27
 
28
+ This model was obtained by quantizing the weights of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) to INT4 data type.
29
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
30
 
31
  Only the weights of the linear operators within transformers blocks are quantized. Symmetric group-wise quantization is applied, in which a linear scaling per group maps the INT4 and floating point representations of the quantized weights.
 
120
  from datasets import load_dataset
121
  import random
122
 
123
+ model_id = "microsoft/Phi-3-medium-128k-instruct"
124
 
125
  num_samples = 512
126
  max_seq_len = 4096
 
184
  <tr>
185
  <td><strong>Benchmark</strong>
186
  </td>
187
+ <td><strong>Phi-3-medium-128k-instruct </strong>
188
  </td>
189
  <td><strong>Phi-3-medium-128k-instruct-quantized.w4a16(this model)</strong>
190
  </td>