Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,6 @@ language:
|
|
7 |
|
8 |
# Model Card for Breeze-7B-Instruct-v0.1
|
9 |
|
10 |
-
|
11 |
Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
|
12 |
|
13 |
[Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
|
@@ -67,7 +66,7 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
67 |
We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
|
68 |
|
69 |
|
70 |
-
| Models |
|
71 |
|----------------------------------------------|--------|--------------|-------------|-------------|------------|
|
72 |
| | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
|
73 |
| | | 5 shot | 3 shot | 5 shot | 5 shot |
|
@@ -83,14 +82,14 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
83 |
|
84 |
**Category ACC of TMMLU+ (5 shot)**
|
85 |
|
86 |
-
| Models | STEM | Social Science | Humanities | Other |
|
87 |
-
|
88 |
-
| Yi-34B
|
89 |
-
| Qwen-14B
|
90 |
-
| Yi-6B
|
91 |
-
| Qwen-7B
|
92 |
-
| **Breeze-7B-Base-v0.1**
|
93 |
-
| Mistral-7B-v0.1
|
94 |
|
95 |
|
96 |
|
@@ -105,7 +104,7 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
105 |
We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
|
106 |
|
107 |
|
108 |
-
| Models |
|
109 |
|---------------------------------------------------------------------------------------------------------|--------|--------------------|--------------|--------------|-------------|-------------|------------------|-------------|-------------|
|
110 |
| | |TC, Chat |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Chat |EN, Knowledge|EN, Knowledge|
|
111 |
| | |0 shot | 0 shot | 5 shot | 3 shot | 0 shot |0 shot | 0 shot | 5 shot |
|
@@ -123,8 +122,8 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
123 |
|
124 |
**Category Score of MT-Bench-tw (0 shot)**
|
125 |
|
126 |
-
| Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|
|
127 |
-
|
128 |
| gpt-3.5-turbo | | | | | | | | | |
|
129 |
| Yi-34B-Chat | | | | | | | | | |
|
130 |
| Qwen-14B-Chat | | | | | | | | | |
|
@@ -137,17 +136,17 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
137 |
|
138 |
**Category ACC of TMMLU+ (0 shot)**
|
139 |
|
140 |
-
| Model | STEM | Social Science | Humanities | Other |
|
141 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
142 |
-
|
|
143 |
-
|
|
144 |
-
|
|
145 |
-
|
|
146 |
-
| **Breeze-7B-Instruct-
|
147 |
-
|
|
148 |
-
|
|
149 |
-
| Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
|
150 |
-
| Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
|
151 |
|
152 |
|
153 |
|
@@ -155,17 +154,17 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
|
|
155 |
In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
|
156 |
All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
|
157 |
|
158 |
-
| Models | Inference Time (sec)|Estimated Max Input Length (Char)|
|
159 |
|--------------------------------------------------------------------|-------------------|--------------------------|
|
160 |
-
| Yi-6B
|
161 |
-
| **Breeze-7B-Instruct-v0.1**
|
162 |
-
| **Breeze-7B-Instruct-64k-v0.1**
|
163 |
-
| Qwen-7B
|
164 |
-
| Qwen-14B
|
165 |
-
| Mistral-7B-v0.1
|
166 |
-
| Taiwan-LLM-7B-v2.1-base
|
167 |
-
| Taiwan-LLM-13B-v2.0-base
|
168 |
-
| Yi-34B
|
169 |
|
170 |
## Long-context Performance
|
171 |
|
@@ -209,3 +208,14 @@ The suggested default `SYS_PROMPT` is
|
|
209 |
```txt
|
210 |
You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
|
211 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
# Model Card for Breeze-7B-Instruct-v0.1
|
9 |
|
|
|
10 |
Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
|
11 |
|
12 |
[Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
|
|
|
66 |
We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
|
67 |
|
68 |
|
69 |
+
| Models | |β TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MMLU (ACC) |
|
70 |
|----------------------------------------------|--------|--------------|-------------|-------------|------------|
|
71 |
| | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
|
72 |
| | | 5 shot | 3 shot | 5 shot | 5 shot |
|
|
|
82 |
|
83 |
**Category ACC of TMMLU+ (5 shot)**
|
84 |
|
85 |
+
| Models | STEM | Social Science | Humanities | Other | β AVG |
|
86 |
+
|----------------------------------|--------------|----------------|------------|------------|-------|
|
87 |
+
| Yi-34B | 56.03 | 73.06 | 61.12 | 62.19 | 63.10 |
|
88 |
+
| Qwen-14B | 46.51 | 58.20 | 51.12 | 49.38 | 51.30 |
|
89 |
+
| Yi-6B | 41.14 | 57.77 | 50.22 | 49.39 | 49.63 |
|
90 |
+
| Qwen-7B | 28.25 | 47.80 | 43.14 | 42.17 | 42.84 |
|
91 |
+
| **Breeze-7B-Base-v0.1** | 35.74 | 46.08 | 40.29 | 39.27 | 40.35 |
|
92 |
+
| Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 | 36.93 |
|
93 |
|
94 |
|
95 |
|
|
|
104 |
We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
|
105 |
|
106 |
|
107 |
+
| Models | |β MT-Bench-tw (Score)| TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench (Score) | MMLU (ACC) | MMLU (ACC) |
|
108 |
|---------------------------------------------------------------------------------------------------------|--------|--------------------|--------------|--------------|-------------|-------------|------------------|-------------|-------------|
|
109 |
| | |TC, Chat |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Chat |EN, Knowledge|EN, Knowledge|
|
110 |
| | |0 shot | 0 shot | 5 shot | 3 shot | 0 shot |0 shot | 0 shot | 5 shot |
|
|
|
122 |
|
123 |
**Category Score of MT-Bench-tw (0 shot)**
|
124 |
|
125 |
+
| Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|β AVG |
|
126 |
+
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|
127 |
| gpt-3.5-turbo | | | | | | | | | |
|
128 |
| Yi-34B-Chat | | | | | | | | | |
|
129 |
| Qwen-14B-Chat | | | | | | | | | |
|
|
|
136 |
|
137 |
**Category ACC of TMMLU+ (0 shot)**
|
138 |
|
139 |
+
| Model | STEM | Social Science | Humanities | Other | β AVG |
|
140 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
141 |
+
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
|
142 |
+
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
|
143 |
+
| Yi-6B-Chat | 37.80 | 51.74 | 45.36 | 44.25 | 44.79 |
|
144 |
+
| gpt-3.5-turbo | 41.56 | 46.72 | 36.73 | 42.03 | 41.76 |
|
145 |
+
| **Breeze-7B-Instruct-v0.1** | 37.41 | 46.81 | 42.06 | 40.16 | 41.61 |
|
146 |
+
| **Breeze-7B-Instruct-64k-v0.1** | 37.88 | 46.35 | 40.31 | 39.40 | 40.99 |
|
147 |
+
| Qwen-7B-Chat | 35.44 | 46.22 | 38.35 | 40.06 | 40.02 |
|
148 |
+
| Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 | 29.47 |
|
149 |
+
| Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 | 28.08 |
|
150 |
|
151 |
|
152 |
|
|
|
154 |
In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
|
155 |
All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
|
156 |
|
157 |
+
| Models | β Inference Time (sec)|Estimated Max Input Length (Char)|
|
158 |
|--------------------------------------------------------------------|-------------------|--------------------------|
|
159 |
+
| Yi-6B | 10.62 | 5.2k |
|
160 |
+
| **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
|
161 |
+
| **Breeze-7B-Instruct-64k-v0.1** | 10.74 | 88.8k |
|
162 |
+
| Qwen-7B | 10.86 | 9.8k |
|
163 |
+
| Qwen-14B | 18.89 | 9.8k |
|
164 |
+
| Mistral-7B-v0.1 | 20.48 | 5.1k |
|
165 |
+
| Taiwan-LLM-7B-v2.1-base | 26.26 | 2.2k |
|
166 |
+
| Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
|
167 |
+
| Yi-34B | 43.71 | 4.5k |
|
168 |
|
169 |
## Long-context Performance
|
170 |
|
|
|
208 |
```txt
|
209 |
You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
|
210 |
```
|
211 |
+
|
212 |
+
## Citation
|
213 |
+
|
214 |
+
```
|
215 |
+
@article{breeze7b2024,
|
216 |
+
title={},
|
217 |
+
author={},
|
218 |
+
journal={arXiv},
|
219 |
+
year={2024}
|
220 |
+
}
|
221 |
+
```
|