Qwen
/

yangapku commited on
Commit
1a2571e
1 Parent(s): 3f9f12c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -100,7 +100,7 @@ For more information, please refer to our [Github repo](https://github.com/QwenL
100
  The details of the model architecture of Qwen-7B-Chat are listed as follows
101
 
102
  | Hyperparameter | Value |
103
- |:--------------:|------:|
104
  | n_layers | 32 |
105
  | n_heads | 32 |
106
  | d_model | 4096 |
@@ -139,7 +139,7 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
139
  We demonstrate the zero-shot accuracy of Qwen-7B-Chat on C-Eval validation set
140
 
141
  | Model | Avg. Acc. |
142
- |:--------------:|------:|
143
  | LLaMA2-7B-Chat | 31.9 |
144
  | LLaMA2-13B-Chat | 40.6 |
145
  | Chinese-Alpaca-2-7B | 41.3 |
@@ -154,7 +154,7 @@ C-Eval测试集上,Qwen-7B-Chat模型的zero-shot准确率结果如下:
154
  The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:
155
 
156
  | Model | Avg. | STEM | Social Sciences | Humanities | Others |
157
- |:--------------:|------:|------:|------:|------:|------:|
158
  | Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
159
  | Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
160
  | ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
@@ -175,7 +175,7 @@ The zero-shot accuracy of Qwen-7B-Chat on MMLU is provided below.
175
  The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
176
 
177
  | Model | Avg. Acc. |
178
- |:--------------:|------:|
179
  | ChatGLM2-6B-Chat | 45.5 |
180
  | LLaMA2-7B-Chat | 47.0 |
181
  | InternLM-7B-Chat | 50.8 |
@@ -190,7 +190,7 @@ Qwen-7B-Chat在[HumanEval](https://github.com/openai/human-eval)的zero-shot Pas
190
  The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
191
 
192
  | Model | Pass@1 |
193
- |:--------------:|------:|
194
  | LLaMA2-7B-Chat | 12.2 |
195
  | InternLM-7B-Chat | 14.0 |
196
  | Baichuan-13B-Chat | 16.5 |
@@ -204,7 +204,7 @@ The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/hu
204
  The accuracy of Qwen-7B-Chat on GSM8K is shown below
205
 
206
  | Model | Zero-shot Acc. | 4-shot Acc. |
207
- |:--------------:|------:|------:|
208
  | ChatGLM2-6B-Chat | - | 28.0 |
209
  | LLaMA2-7B-Chat | 20.4 | 28.2 |
210
  | LLaMA2-13B-Chat | 29.4 | 36.7 |
@@ -224,7 +224,7 @@ We introduce NTK-aware interpolation, LogN attention scaling to extend the conte
224
  **(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
225
 
226
  | Model | VCSUM (zh) |
227
- |----------------|-------|
228
  | GPT-3.5-Turbo-16k | 16.0 |
229
  | LLama2-7B-Chat | 0.2 |
230
  | InternLM-7B-Chat | 13.0 |
@@ -240,7 +240,7 @@ We introduce NTK-aware interpolation, LogN attention scaling to extend the conte
240
  Qwen-7B-Chat supports calling plugins/tools/APIs through [ReAct Prompting](https://arxiv.org/abs/2210.03629). ReAct is also one of the main approaches used by the [LangChain](https://python.langchain.com/) framework. In the soon-to-be-released evaluation benchmark for assessing tool usage capabilities, Qwen-7B-Chat's performance is as follows:
241
 
242
  | Model | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
243
- |------------------|------------------------|-----------------------|-----------------------|
244
  | GPT-4 | 95% | **0.90** | 15% |
245
  | GPT-3.5 | 85% | 0.88 | 75% |
246
  | **Qwen-7B-Chat** | **99%** | 0.89 | **8.5%** |
@@ -263,7 +263,7 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
263
  Qwen-7B-Chat also has the capability to be used as a [HuggingFace Agent](https://huggingface.co/docs/transformers/transformers_agents). Its performance on the run-mode benchmark provided by HuggingFace is as follows:
264
 
265
  | Model | Tool Selection↑ | Tool Used↑ | Code↑ |
266
- |-|-|-|-|
267
  |GPT-4 | **100** | **100** | **97.41** |
268
  |GPT-3.5 | 95.37 | 96.30 | 87.04 |
269
  |StarCoder-15.5B | 87.04 | 87.96 | 68.89 |
@@ -303,7 +303,7 @@ model = AutoModelForCausalLM.from_pretrained(
303
  With this method, it is available to load Qwen-7B-Chat in `NF4`and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
304
 
305
  | Precision | MMLU | Memory |
306
- | :---------: | -------: | -----: |
307
  | BF16 | 56.7 | 16.2G |
308
  | Int8 | 52.8 | 10.1G |
309
  | NF4 | 48.9 | 7.4G |
 
100
  The details of the model architecture of Qwen-7B-Chat are listed as follows
101
 
102
  | Hyperparameter | Value |
103
+ |:------|:------|
104
  | n_layers | 32 |
105
  | n_heads | 32 |
106
  | d_model | 4096 |
 
139
  We demonstrate the zero-shot accuracy of Qwen-7B-Chat on C-Eval validation set
140
 
141
  | Model | Avg. Acc. |
142
+ |:--------------|:------:|
143
  | LLaMA2-7B-Chat | 31.9 |
144
  | LLaMA2-13B-Chat | 40.6 |
145
  | Chinese-Alpaca-2-7B | 41.3 |
 
154
  The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:
155
 
156
  | Model | Avg. | STEM | Social Sciences | Humanities | Others |
157
+ |:--------------|:------:|:------:|:------:|:------:|:------:|
158
  | Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
159
  | Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
160
  | ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
 
175
  The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
176
 
177
  | Model | Avg. Acc. |
178
+ |:--------------|:------:|
179
  | ChatGLM2-6B-Chat | 45.5 |
180
  | LLaMA2-7B-Chat | 47.0 |
181
  | InternLM-7B-Chat | 50.8 |
 
190
  The zero-shot Pass@1 of Qwen-7B-Chat on [HumanEval](https://github.com/openai/human-eval) is demonstrated below
191
 
192
  | Model | Pass@1 |
193
+ |:--------------|:------:|
194
  | LLaMA2-7B-Chat | 12.2 |
195
  | InternLM-7B-Chat | 14.0 |
196
  | Baichuan-13B-Chat | 16.5 |
 
204
  The accuracy of Qwen-7B-Chat on GSM8K is shown below
205
 
206
  | Model | Zero-shot Acc. | 4-shot Acc. |
207
+ |:--------------|:------:|:------:|
208
  | ChatGLM2-6B-Chat | - | 28.0 |
209
  | LLaMA2-7B-Chat | 20.4 | 28.2 |
210
  | LLaMA2-13B-Chat | 29.4 | 36.7 |
 
224
  **(To use these tricks, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
225
 
226
  | Model | VCSUM (zh) |
227
+ |:----------------|:-------:|
228
  | GPT-3.5-Turbo-16k | 16.0 |
229
  | LLama2-7B-Chat | 0.2 |
230
  | InternLM-7B-Chat | 13.0 |
 
240
  Qwen-7B-Chat supports calling plugins/tools/APIs through [ReAct Prompting](https://arxiv.org/abs/2210.03629). ReAct is also one of the main approaches used by the [LangChain](https://python.langchain.com/) framework. In the soon-to-be-released evaluation benchmark for assessing tool usage capabilities, Qwen-7B-Chat's performance is as follows:
241
 
242
  | Model | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
243
+ |:-----------------|:----------------------:|:---------------------:|:---------------------:|
244
  | GPT-4 | 95% | **0.90** | 15% |
245
  | GPT-3.5 | 85% | 0.88 | 75% |
246
  | **Qwen-7B-Chat** | **99%** | 0.89 | **8.5%** |
 
263
  Qwen-7B-Chat also has the capability to be used as a [HuggingFace Agent](https://huggingface.co/docs/transformers/transformers_agents). Its performance on the run-mode benchmark provided by HuggingFace is as follows:
264
 
265
  | Model | Tool Selection↑ | Tool Used↑ | Code↑ |
266
+ |:-|:-:|:-:|:-:|
267
  |GPT-4 | **100** | **100** | **97.41** |
268
  |GPT-3.5 | 95.37 | 96.30 | 87.04 |
269
  |StarCoder-15.5B | 87.04 | 87.96 | 68.89 |
 
303
  With this method, it is available to load Qwen-7B-Chat in `NF4`and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
304
 
305
  | Precision | MMLU | Memory |
306
+ | :---------| :-------: | :-----: |
307
  | BF16 | 56.7 | 16.2G |
308
  | Int8 | 52.8 | 10.1G |
309
  | NF4 | 48.9 | 7.4G |