Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) Qwen-14B - GGUF - Model creator: https://huggingface.co/Qwen/ - Original model: https://huggingface.co/Qwen/Qwen-14B/ | Name | Quant method | Size | | ---- | ---- | ---- | | [Qwen-14B.Q2_K.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q2_K.gguf) | Q2_K | 5.41GB | | [Qwen-14B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.IQ3_XS.gguf) | IQ3_XS | 6.12GB | | [Qwen-14B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.IQ3_S.gguf) | IQ3_S | 6.31GB | | [Qwen-14B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q3_K_S.gguf) | Q3_K_S | 6.31GB | | [Qwen-14B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.IQ3_M.gguf) | IQ3_M | 6.87GB | | [Qwen-14B.Q3_K.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q3_K.gguf) | Q3_K | 7.16GB | | [Qwen-14B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q3_K_M.gguf) | Q3_K_M | 7.16GB | | [Qwen-14B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q3_K_L.gguf) | Q3_K_L | 7.44GB | | [Qwen-14B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.IQ4_XS.gguf) | IQ4_XS | 7.37GB | | [Qwen-14B.Q4_0.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q4_0.gguf) | Q4_0 | 7.62GB | | [Qwen-14B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.IQ4_NL.gguf) | IQ4_NL | 7.68GB | | [Qwen-14B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q4_K_S.gguf) | Q4_K_S | 7.96GB | | [Qwen-14B.Q4_K.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q4_K.gguf) | Q4_K | 8.8GB | | [Qwen-14B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q4_K_M.gguf) | Q4_K_M | 8.8GB | | [Qwen-14B.Q4_1.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q4_1.gguf) | Q4_1 | 8.4GB | | [Qwen-14B.Q5_0.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q5_0.gguf) | Q5_0 | 9.18GB | | [Qwen-14B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q5_K_S.gguf) | Q5_K_S | 9.34GB | | [Qwen-14B.Q5_K.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q5_K.gguf) | Q5_K | 10.14GB | | [Qwen-14B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q5_K_M.gguf) | Q5_K_M | 10.14GB | | [Qwen-14B.Q5_1.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q5_1.gguf) | Q5_1 | 9.96GB | | [Qwen-14B.Q6_K.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q6_K.gguf) | Q6_K | 11.46GB | | [Qwen-14B.Q8_0.gguf](https://huggingface.co/RichardErkhov/Qwen_-_Qwen-14B-gguf/blob/main/Qwen-14B.Q8_0.gguf) | Q8_0 | 14.03GB | Original model description: --- language: - zh - en tags: - qwen pipeline_tag: text-generation inference: false --- # Qwen-14B
🤗 Hugging Face   |   🤖 ModelScope   |    📑 Paper    |   🖥️ Demo
WeChat (微信)   |   Discord   |   API
For position encoding, FFN activation function, and normalization methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).
For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-14B uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the [tiktoken](https://github.com/openai/tiktoken) tokenizer library for efficient tokenization.
We randomly selected 1 million document corpus of each language to test and compare the encoding compression rates of different models (with XLM-R, which supports 100 languages, as the base value 1). The specific performance is shown in the figure above.
As can be seen, while ensuring the efficient decoding of Chinese, English, and code, Qwen-14B also achieves a high compression rate for many other languages (such as th, he, ar, ko, vi, ja, tr, id, pl, ru, nl, pt, it, de, es, fr etc.), equipping the model with strong scalability as well as high training and inference efficiency in these languages.
For pre-training data, on the one hand, Qwen-14B uses part of the open-source generic corpus. On the other hand, it uses a massive amount of accumulated web corpus and high-quality text content. The scale of corpus reaches over 3T tokens after deduplication and filtration, encompassing web text, encyclopedias, books, code, mathematics, and various domain.
## 评测效果(Evaluation)
我们选取了MMLU,C-Eval,GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU等目前较流行的benchmark,对模型的中英知识能力、翻译、数学推理、代码等能力进行综合评测。从下列结果可以看到Qwen模型在所有benchmark上均取得了同级别开源模型中的最优表现。
We selected MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, which are currently popular benchmarks, to test the model’s Chinese and English knowledge capabilities, translation, mathematical reasoning, coding and other capabilities. From the following comprehensive evaluation results, we can see that the Qwen model outperform the similarly sized open-source models on all tasks.
| Model | MMLU | C-Eval | GSM8K | MATH | HumanEval | MBPP | BBH | CMMLU |
|:-------------------|:--------:|:--------:|:--------:|:--------:|:---------:|:--------:|:--------:|:--------:|
| | 5-shot | 5-shot | 8-shot | 4-shot | 0-shot | 3-shot | 3-shot | 5-shot |
| LLaMA2-7B | 46.8 | 32.5 | 16.7 | 3.3 | 12.8 | 20.8 | 38.2 | 31.8 |
| LLaMA2-13B | 55.0 | 41.4 | 29.6 | 5.0 | 18.9 | 30.3 | 45.6 | 38.4 |
| LLaMA2-34B | 62.6 | - | 42.2 | 6.2 | 22.6 | 33.0 | 44.1 | - |
| ChatGLM2-6B | 47.9 | 51.7 | 32.4 | 6.5 | - | - | 33.7 | - |
| InternLM-7B | 51.0 | 53.4 | 31.2 | 6.3 | 10.4 | 14.0 | 37.0 | 51.8 |
| InternLM-20B | 62.1 | 58.8 | 52.6 | 7.9 | 25.6 | 35.6 | 52.5 | 59.0 |
| Baichuan2-7B | 54.7 | 56.3 | 24.6 | 5.6 | 18.3 | 24.2 | 41.6 | 57.1 |
| Baichuan2-13B | 59.5 | 59.0 | 52.8 | 10.1 | 17.1 | 30.2 | 49.0 | 62.0 |
| Qwen-7B (original) | 56.7 | 59.6 | 51.6 | - | 24.4 | 31.2 | 40.6 | 58.8 |
| **Qwen-7B** | 58.2 | 63.5 | 51.7 | 11.6 | 29.9 | 31.6 | 45.0 | 62.2 |
| **Qwen-14B** | **66.3** | **72.1** | **61.3** | **24.8** | **32.3** | **40.8** | **53.4** | **71.0** |
### 长序列评测(Long-Context Evaluation)
我们引入NTK插值,LogN注意力缩放,窗口注意力等技巧,将Qwen-7B (original)和14B模型的上下文长度从2K扩展到8K以上,将Qwen-7B从8K扩到32K。在arXiv数据上使用PPL指标测试Qwen-7B和Qwen-14B在不同长度下的表现,结果如下:
**(若要启用NTK和LogN注意力缩放,请将config.json里的`use_dynamic_ntk`和`use_logn_attn`设置为true)**
We introduce NTK-aware interpolation, LogN attention scaling, Window attention, etc. to extend the context length to over 8K tokens. We conduct language modeling experiments on the arXiv dataset with the PPL evaluation. Results are demonstrated below:
**(To use NTK interpolation and LogN scaling, please set `use_dynamic_ntk` and `use_long_attn` to true in config.json.)**
Model | Sequence Length | |||||
---|---|---|---|---|---|---|
1024 | 2048 | 4096 | 8192 | 16384 | 32768 | |
Qwen-7B (original) | 4.23 | 3.78 | 39.35 | 469.81 | 2645.09 | - |
+ dynamic_ntk | 4.23 | 3.78 | 3.59 | 3.66 | 5.71 | - |
+ dynamic_ntk + logn | 4.23 | 3.78 | 3.58 | 3.56 | 4.62 | - |
+ dynamic_ntk + logn + window_attn | 4.23 | 3.78 | 3.58 | 3.49 | 4.32 | - |
Qwen-7B | 4.23 | 3.81 | 3.52 | 3.31 | 7.27 | 181.49 |
+ dynamic_ntk + logn + window_attn | 4.23 | 3.81 | 3.52 | 3.33 | 3.22 | 3.17 |
Qwen-14B | - | 3.46 | 22.79 | 334.65 | 3168.35 | - |
+ dynamic_ntk + logn + window_attn | - | 3.46 | 3.29 | 3.18 | 3.42 | - |