alexmarques
commited on
Commit
•
c1fdfad
1
Parent(s):
1d11c7a
Update README.md
Browse files
README.md
CHANGED
@@ -13,19 +13,19 @@ license: llama2
|
|
13 |
- **Output:** Text
|
14 |
- **Model Optimizations:**
|
15 |
- **Weight quantization:** INT4
|
16 |
-
- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Phi-3-medium-
|
17 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
18 |
- **Release Date:** 7/11/2024
|
19 |
- **Version:** 1.0
|
20 |
- **License(s)**: [MIT](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE)
|
21 |
- **Model Developers:** Neural Magic
|
22 |
|
23 |
-
Quantized version of [Phi-3-medium-
|
24 |
It achieves an average score of 72.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.46.
|
25 |
|
26 |
### Model Optimizations
|
27 |
|
28 |
-
This model was obtained by quantizing the weights of [Phi-3-medium-
|
29 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
30 |
|
31 |
Only the weights of the linear operators within transformers blocks are quantized. Symmetric group-wise quantization is applied, in which a linear scaling per group maps the INT4 and floating point representations of the quantized weights.
|
@@ -120,7 +120,7 @@ from llmcompressor.modifiers.quantization import GPTQModifier
|
|
120 |
from datasets import load_dataset
|
121 |
import random
|
122 |
|
123 |
-
model_id = "microsoft/Phi-3-medium-
|
124 |
|
125 |
num_samples = 512
|
126 |
max_seq_len = 4096
|
@@ -184,7 +184,7 @@ lm_eval \
|
|
184 |
<tr>
|
185 |
<td><strong>Benchmark</strong>
|
186 |
</td>
|
187 |
-
<td><strong>Phi-3-medium-
|
188 |
</td>
|
189 |
<td><strong>Phi-3-medium-128k-instruct-quantized.w4a16(this model)</strong>
|
190 |
</td>
|
|
|
13 |
- **Output:** Text
|
14 |
- **Model Optimizations:**
|
15 |
- **Weight quantization:** INT4
|
16 |
+
- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct), this models is intended for assistant-like chat.
|
17 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
18 |
- **Release Date:** 7/11/2024
|
19 |
- **Version:** 1.0
|
20 |
- **License(s)**: [MIT](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE)
|
21 |
- **Model Developers:** Neural Magic
|
22 |
|
23 |
+
Quantized version of [Phi-3-medium-128-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct).
|
24 |
It achieves an average score of 72.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.46.
|
25 |
|
26 |
### Model Optimizations
|
27 |
|
28 |
+
This model was obtained by quantizing the weights of [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) to INT4 data type.
|
29 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
30 |
|
31 |
Only the weights of the linear operators within transformers blocks are quantized. Symmetric group-wise quantization is applied, in which a linear scaling per group maps the INT4 and floating point representations of the quantized weights.
|
|
|
120 |
from datasets import load_dataset
|
121 |
import random
|
122 |
|
123 |
+
model_id = "microsoft/Phi-3-medium-128k-instruct"
|
124 |
|
125 |
num_samples = 512
|
126 |
max_seq_len = 4096
|
|
|
184 |
<tr>
|
185 |
<td><strong>Benchmark</strong>
|
186 |
</td>
|
187 |
+
<td><strong>Phi-3-medium-128k-instruct </strong>
|
188 |
</td>
|
189 |
<td><strong>Phi-3-medium-128k-instruct-quantized.w4a16(this model)</strong>
|
190 |
</td>
|