|
--- |
|
language: |
|
- zh |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
--- |
|
# Baichuan-13B-Chat |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
## 介绍 |
|
Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本,预训练模型可见[Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base)。 |
|
|
|
Baichuan-13B 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型,在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点: |
|
|
|
1. **更大尺寸、更多数据**:Baichuan-13B在[Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 的基础上进一步扩大参数量到130亿,并且在高质量的语料上训练了1.4万亿tokens,超过LLaMA-13B 40%,是当前开源13B尺寸下训练数据量最多的模型。支持中英双语,使用ALiBi 位置编码,上下文窗口长度为 4096。 |
|
2. **同时开源预训练和对齐模型**:预训练模型是适用开发者的”基座“,而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力,开箱即用,几行代码即可简单的部署。 |
|
3. **更高效的推理**:为了支持更广大用户的使用,我们本次同时开源了INT8和INT4的量化版本,在几乎没有效果损失的情况下可以很方便的将模型部署在如3090等消费机显卡上。 |
|
4. **开源免费可商用**:Baichuan-13B不仅对学术研究完全开放,开发者也仅需邮件申请并获得官方商用许可后,即可以免费商用。 |
|
|
|
## Introduction |
|
Baichuan-13B-Chat is the aligned version in the Baichuan-13B series of models, and the pre-trained model can be found at [Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B-Base). |
|
|
|
Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features: |
|
|
|
1. **Larger size, more data**: Baichuan-13B further expands the parameter volume to 13 billion based on [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B), and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096. |
|
2. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code. |
|
3. **More efficient inference**: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the 3090 with almost no performance loss. |
|
4. **Open-source, free, and commercially usable**: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
如下是一个使用Baichuan-13B-Chat进行对话的示例,正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰,海拔高度是8611米,位于喀喇昆仑山脉的中巴边境上" |
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
from transformers.generation.utils import GenerationConfig |
|
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True) |
|
model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True) |
|
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat") |
|
messages = [] |
|
messages.append({"role": "user", "content": "世界上第二高的山峰是哪座"}) |
|
response = model.chat(tokenizer, messages) |
|
print(response) |
|
``` |
|
|
|
Here is an example of a conversation using Baichuan-13B-Chat, the correct output is "K2. The world's second highest peak - K2, also known as Mount Godwin-Austen or Chhogori, with an altitude of 8611 meters, is located on the China-Pakistan border in the Karakoram Range." |
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
from transformers.generation.utils import GenerationConfig |
|
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True) |
|
model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True) |
|
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat") |
|
messages = [] |
|
messages.append({"role": "user", "content": "Which moutain is the second highest one in the world?"}) |
|
response = model.chat(tokenizer, messages) |
|
print(response) |
|
``` |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** 百川智能(Baichuan Intelligent Technology) |
|
- **Email**: [email protected] |
|
- **Language(s) (NLP):** Chinese/English |
|
- **License:** [Baichuan-13B License]() |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
整体模型基于Baichuan-7B,为了获得更好的推理性能,Baichuan-13B 使用了 ALiBi 线性偏置技术,相对于 Rotary Embedding 计算量更小,对推理性能有显著提升;与标准的 LLaMA-13B 相比,生成 2000 个 tokens 的平均推理速度 (tokens/s),实测提升 31.6%: |
|
|
|
| Model | tokens/s | |
|
|-------------|----------| |
|
| LLaMA-13B | 19.4 | |
|
| Baichuan-13B| 25.4 | |
|
|
|
具体参数和见下表 |
|
| 模型名称 | 隐含层维度 | 层数 | 头数 |词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长度 | |
|
|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------| |
|
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 | |
|
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096 |
|
|
|
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%: |
|
|
|
| Model | tokens/s | |
|
|-------------|----------| |
|
| LLaMA-13B | 19.4 | |
|
| Baichuan-13B| 25.4 | |
|
|
|
The specific parameters are as follows: |
|
| Model Name | Hidden Size | Num Layers | Num Attention Heads |Vocab Size | Total Params | Training Dats(tokens) | Position Embedding | Max Length | |
|
|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------| |
|
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 | |
|
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096 |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Downstream Use |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
我们同时开源出了和本模型配套的训练代码,允许进行高效的Finetune用于下游任务,具体参见[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)。 |
|
|
|
We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to [baichuan-13B](https://github.com/baichuan-inc/baichuan-13B). |
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
在没有充分评估风险和采取缓解措施的情况下投入生产使用;任何可能被视为不负责任或有害的使用案例。 |
|
|
|
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
Baichuan-13B可能会产生事实上不正确的输出,不应依赖它产生事实上准确的信息。Baichuan-13B是在各种公共数据集上进行训练的。尽管我们已经做出了巨大的努力来清洗预训练数据,但这个模型可能会生成淫秽、偏见或其他冒犯性的输出。 |
|
|
|
Baichuan-13B can produce factually incorrect output, and should not be relied on to produce factually accurate information. Baichuan-13B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs. |
|
|
|
## Training Details |
|
|
|
训练具体设置参见[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)。 |
|
|
|
For specific training settings, please refer to [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B). |
|
|
|
## Evaluation |
|
|
|
# Benchmark结果 |
|
|
|
我们在各个 benchmark 下进行了`5-shot`评测,所采用的方法和 [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) 项目中相同。结果如下: |
|
|
|
We conducted a `5-shot` evaluation under various benchmarks, using the same method as in the [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) project. The results are as follows: |
|
## C-Eval |
|
|
|
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average | |
|
|-------------------------|-------|-----------------|------------|--------|---------| |
|
| ChatGLM2-6B | 45.9 | 61.6 | 49.7 | 48.2 | 50.2 | |
|
| InternLM-7B<sup>*</sup> | 40.1 | 55.7 | 49.4 | 37.9 | 44.6 | |
|
| Baichuan-7B | 38.2 | 52.0 | 46.2 | 39.3 | 42.8 | |
|
| Ziya-LLaMA-13B-Pretrain | 27.6 | 34.4 | 32.0 | 28.6 | 30.0 | |
|
| LLaMA-13B | 27.0 | 33.6 | 27.7 | 27.6 | 28.5 | |
|
| moss-moon-003-base (16B)| 27.0 | 29.1 | 27.2 | 26.9 | 27.4 | |
|
| vicuna-13B | 22.8 | 24.8 | 22.3 | 18.5 | 22.2 | |
|
| **Baichuan-13B-Base** | **45.9** | **63.5** | **57.2** | **49.3** | **52.4** | |
|
| **Baichuan-13B-Chat** | **43.7** | **64.6** | **56.2** | **49.2** | **51.5** | |
|
> *说明:表中各个模型的结果是使用统一的评估代码得到。[InternLM-7B](https://huggingface.co/internlm/internlm-7b) 汇报使用 [OpenCompass](https://opencompass.org.cn/rank) 工具评估的C-Eval平均值为 53.4,我们使用 OpenCompass 评估 InternLM-7B 的平均值为 51.6 |
|
|
|
## MMLU |
|
|
|
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average | |
|
|-------------------------|-------|-----------------|------------|--------|---------| |
|
| LLaMA-13B | 36.1 | 53.0 | 44.0 | 52.8 | 46.3 | |
|
| ChatGLM2-6B | 38.2 | 52.5 | 43.2 | 50.8 | 45.9 | |
|
| InternLM-7B | 38.0 | 51.1 | 39.2 | 50.2 | 44.1 | |
|
| Ziya-LLaMA-13B-Pretrain | 35.6 | 47.6 | 40.1 | 49.4 | 42.9 | |
|
| Baichuan-7B | 35.6 | 48.9 | 38.4 | 48.1 | 42.3 | |
|
| vicuna-13B | 24.2 | 24.1 | 24.6 | 26.8 | 24.9 | |
|
| moss-moon-003-base (16B)| 22.4 | 22.8 | 24.2 | 24.4 | 23.6 | |
|
| **Baichuan-13B-Base** | **41.6** | **60.9** | **47.4** | **58.5** | **51.6** | |
|
| **Baichuan-13B-Chat** | **40.9** | **60.9** | **48.8** | **59.0** | **52.1** | |
|
|
|
|
|
## CMMLU |
|
|
|
| Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average | |
|
|-------------------------|-------|------------|-----------------|--------|----------------|---------| |
|
| InternLM-7B | 41.7 | 54.4 | 56.4 | 55.4 | 53.1 | 52.1 | |
|
| ChatGLM2-6B | 42.5 | 51.4 | 51.4 | 50.7 | 48.4 | 49.0 | |
|
| Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 | |
|
| Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 | |
|
| LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 | |
|
| moss-moon-003-base (16B)| 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 | |
|
| vicuna-13B | 24.0 | 25.4 | 25.3 | 25.0 | 25.0 | 24.9 | |
|
| **Baichuan-13B-Base** | **41.7** | **61.1** | **59.8** | **59.0** | **56.4** | **55.3** | |
|
| **Baichuan-13B-Chat** | **42.8** | **62.6** | **59.7** | **59.0** | **56.1** | **55.8** | |
|
|
|
> 说明:CMMLU是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的[评测方案](https://github.com/haonan-li/CMMLU)。 |
|
|
|
## Our Group |
|
![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true) |
|
|