YC-Chen commited on
Commit
277e69e
β€’
1 Parent(s): 29e7be5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -33
README.md CHANGED
@@ -7,7 +7,6 @@ language:
7
 
8
  # Model Card for Breeze-7B-Instruct-v0.1
9
 
10
-
11
  Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
12
 
13
  [Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
@@ -67,7 +66,7 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
67
  We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
68
 
69
 
70
- | Models | | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MMLU (ACC) |
71
  |----------------------------------------------|--------|--------------|-------------|-------------|------------|
72
  | | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
73
  | | | 5 shot | 3 shot | 5 shot | 5 shot |
@@ -83,14 +82,14 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
83
 
84
  **Category ACC of TMMLU+ (5 shot)**
85
 
86
- | Models | STEM | Social Science | Humanities | Other |
87
- |-----------------------------------------------------|--------------|----------------|------------|------------|
88
- | Yi-34B | 56.03 | 73.06 | 61.12 | 62.19 |
89
- | Qwen-14B | 46.51 | 58.20 | 51.12 | 49.38 |
90
- | Yi-6B | 41.14 | 57.77 | 50.22 | 49.39 |
91
- | Qwen-7B | 28.25 | 47.80 | 43.14 | 42.17 |
92
- | **Breeze-7B-Base-v0.1** | 35.74 | 46.08 | 40.29 | 39.27 |
93
- | Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
94
 
95
 
96
 
@@ -105,7 +104,7 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
105
  We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
106
 
107
 
108
- | Models | |MT-Bench-tw (Score) | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench (Score) | MMLU (ACC) | MMLU (ACC) |
109
  |---------------------------------------------------------------------------------------------------------|--------|--------------------|--------------|--------------|-------------|-------------|------------------|-------------|-------------|
110
  | | |TC, Chat |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Chat |EN, Knowledge|EN, Knowledge|
111
  | | |0 shot | 0 shot | 5 shot | 3 shot | 0 shot |0 shot | 0 shot | 5 shot |
@@ -123,8 +122,8 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
123
 
124
  **Category Score of MT-Bench-tw (0 shot)**
125
 
126
- | Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|Average|
127
- |-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|--------|
128
  | gpt-3.5-turbo | | | | | | | | | |
129
  | Yi-34B-Chat | | | | | | | | | |
130
  | Qwen-14B-Chat | | | | | | | | | |
@@ -137,17 +136,17 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
137
 
138
  **Category ACC of TMMLU+ (0 shot)**
139
 
140
- | Model | STEM | Social Science | Humanities | Other | Average |
141
  |-----------------------------------------------------|--------------|----------------|------------|------------|---------|
142
- | gpt-3.5-turbo | 41.56 | 46.72 | 36.73 | 42.03 | |
143
- | Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | |
144
- | Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | |
145
- | **Breeze-7B-Instruct-v0.1** | 37.41 | 46.81 | 42.06 | 40.16 | |
146
- | **Breeze-7B-Instruct-64k-v0.1** | 37.88 | 46.35 | 40.31 | 39.40 | |
147
- | Qwen-7B-Chat | 35.44 | 46.22 | 38.35 | 40.06 | |
148
- | Yi-6B-Chat | 37.80 | 51.74 | 45.36 | 44.25 | |
149
- | Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 | |
150
- | Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 | |
151
 
152
 
153
 
@@ -155,17 +154,17 @@ Breeze-7B-Instruct-64k-v0.1 can solve tasks such as question answering and summa
155
  In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
156
  All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
157
 
158
- | Models | Inference Time (sec)|Estimated Max Input Length (Char)|
159
  |--------------------------------------------------------------------|-------------------|--------------------------|
160
- | Yi-6B | 10.62 | 5.2k |
161
- | **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
162
- | **Breeze-7B-Instruct-64k-v0.1** | 10.74 | 88.8k |
163
- | Qwen-7B | 10.86 | 9.8k |
164
- | Qwen-14B | 18.89 | 9.8k |
165
- | Mistral-7B-v0.1 | 20.48 | 5.1k |
166
- | Taiwan-LLM-7B-v2.1-base | 26.26 | 2.2k |
167
- | Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
168
- | Yi-34B | 43.71 | 4.5k |
169
 
170
  ## Long-context Performance
171
 
@@ -209,3 +208,14 @@ The suggested default `SYS_PROMPT` is
209
  ```txt
210
  You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
211
  ```
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  # Model Card for Breeze-7B-Instruct-v0.1
9
 
 
10
  Breeze-7B is a language model that builds upon the foundation of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically enhanced for Traditional Chinese.
11
 
12
  [Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) introduces an expanded vocabulary with additional 30,000 Traditional Chinese tokens and
 
66
  We use the code revised from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate **TMMLU+**, **DRCD**, **Table**, and **MMLU**.
67
 
68
 
69
+ | Models | |↑ TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MMLU (ACC) |
70
  |----------------------------------------------|--------|--------------|-------------|-------------|------------|
71
  | | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
72
  | | | 5 shot | 3 shot | 5 shot | 5 shot |
 
82
 
83
  **Category ACC of TMMLU+ (5 shot)**
84
 
85
+ | Models | STEM | Social Science | Humanities | Other | ↑ AVG |
86
+ |----------------------------------|--------------|----------------|------------|------------|-------|
87
+ | Yi-34B | 56.03 | 73.06 | 61.12 | 62.19 | 63.10 |
88
+ | Qwen-14B | 46.51 | 58.20 | 51.12 | 49.38 | 51.30 |
89
+ | Yi-6B | 41.14 | 57.77 | 50.22 | 49.39 | 49.63 |
90
+ | Qwen-7B | 28.25 | 47.80 | 43.14 | 42.17 | 42.84 |
91
+ | **Breeze-7B-Base-v0.1** | 35.74 | 46.08 | 40.29 | 39.27 | 40.35 |
92
+ | Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 | 36.93 |
93
 
94
 
95
 
 
104
  We use the code revised from [fastchat llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) to evaluate **MT-Bench-tw** and **MT-Bench**.
105
 
106
 
107
+ | Models | |↑ MT-Bench-tw (Score)| TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench (Score) | MMLU (ACC) | MMLU (ACC) |
108
  |---------------------------------------------------------------------------------------------------------|--------|--------------------|--------------|--------------|-------------|-------------|------------------|-------------|-------------|
109
  | | |TC, Chat |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Chat |EN, Knowledge|EN, Knowledge|
110
  | | |0 shot | 0 shot | 5 shot | 3 shot | 0 shot |0 shot | 0 shot | 5 shot |
 
122
 
123
  **Category Score of MT-Bench-tw (0 shot)**
124
 
125
+ | Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|↑ AVG |
126
+ |-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
127
  | gpt-3.5-turbo | | | | | | | | | |
128
  | Yi-34B-Chat | | | | | | | | | |
129
  | Qwen-14B-Chat | | | | | | | | | |
 
136
 
137
  **Category ACC of TMMLU+ (0 shot)**
138
 
139
+ | Model | STEM | Social Science | Humanities | Other | ↑ AVG |
140
  |-----------------------------------------------------|--------------|----------------|------------|------------|---------|
141
+ | Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
142
+ | Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
143
+ | Yi-6B-Chat | 37.80 | 51.74 | 45.36 | 44.25 | 44.79 |
144
+ | gpt-3.5-turbo | 41.56 | 46.72 | 36.73 | 42.03 | 41.76 |
145
+ | **Breeze-7B-Instruct-v0.1** | 37.41 | 46.81 | 42.06 | 40.16 | 41.61 |
146
+ | **Breeze-7B-Instruct-64k-v0.1** | 37.88 | 46.35 | 40.31 | 39.40 | 40.99 |
147
+ | Qwen-7B-Chat | 35.44 | 46.22 | 38.35 | 40.06 | 40.02 |
148
+ | Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 | 29.47 |
149
+ | Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 | 28.08 |
150
 
151
 
152
 
 
154
  In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again.
155
  All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2).
156
 
157
+ | Models | ↓ Inference Time (sec)|Estimated Max Input Length (Char)|
158
  |--------------------------------------------------------------------|-------------------|--------------------------|
159
+ | Yi-6B | 10.62 | 5.2k |
160
+ | **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
161
+ | **Breeze-7B-Instruct-64k-v0.1** | 10.74 | 88.8k |
162
+ | Qwen-7B | 10.86 | 9.8k |
163
+ | Qwen-14B | 18.89 | 9.8k |
164
+ | Mistral-7B-v0.1 | 20.48 | 5.1k |
165
+ | Taiwan-LLM-7B-v2.1-base | 26.26 | 2.2k |
166
+ | Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k |
167
+ | Yi-34B | 43.71 | 4.5k |
168
 
169
  ## Long-context Performance
170
 
 
208
  ```txt
209
  You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
210
  ```
211
+
212
+ ## Citation
213
+
214
+ ```
215
+ @article{breeze7b2024,
216
+ title={},
217
+ author={},
218
+ journal={arXiv},
219
+ year={2024}
220
+ }
221
+ ```