Update README.md
Browse files
README.md
CHANGED
@@ -10,23 +10,21 @@ inference: false
|
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
## 介绍
|
13 |
-
Baichuan-13B
|
14 |
|
15 |
-
1.
|
16 |
-
2.
|
17 |
-
3.
|
18 |
-
4.
|
19 |
|
20 |
## Introduction
|
21 |
-
Baichuan-13B is an open-source, commercially
|
22 |
|
23 |
-
1. **
|
|
|
|
|
|
|
24 |
|
25 |
-
2. **Larger size, more data**: On the basis of Baichuan-7B, the parameter volume is further expanded to 130 billion, and 1.4 trillion tokens have been trained on high-quality corpora, making it the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses [ALiBi](https://arxiv.org/abs/2108.12409) position encoding, and has a context window length of 4096.
|
26 |
-
|
27 |
-
3. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we have also released an alignment model (Baichuan-13B-Chat) which has strong dialogue capabilities, is ready to use, and supports simple deployment.
|
28 |
-
|
29 |
-
4. **More efficient inference**: To support a wider range of users, we have also open-sourced the int8 and int4 quantized versions this time. With almost no loss of effect, the model can be easily deployed on low-memory machines.
|
30 |
|
31 |
## How to Get Started with the Model
|
32 |
|
@@ -53,7 +51,7 @@ tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_
|
|
53 |
model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
|
54 |
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
|
55 |
messages = []
|
56 |
-
messages.append({"role": "user", "content": "
|
57 |
response = model.chat(tokenizer, messages)
|
58 |
print(response)
|
59 |
```
|
@@ -182,5 +180,7 @@ We conducted a `5-shot` evaluation under various benchmarks, using the same meth
|
|
182 |
| **Baichuan-13B-Base** | **41.7** | **61.1** | **59.8** | **59.0** | **56.4** | **55.3** |
|
183 |
| **Baichuan-13B-Chat** | **42.8** | **62.6** | **59.7** | **59.0** | **56.1** | **55.8** |
|
184 |
|
|
|
|
|
185 |
## Our Group
|
186 |
![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true)
|
|
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
## 介绍
|
13 |
+
Baichuan-13B 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型,在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点:
|
14 |
|
15 |
+
1. **更大尺寸、更多数据**:Baichuan-13B在[Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 的基础上进一步扩大参数量到130亿,并且在高质量的语料上训练了1.4万亿tokens,超过LLaMA-13B 40%,是当前开源13B尺寸下训练数据量最多的模型。支持中英双语,使用ALiBi 位置编码,上下文窗口长度为 4096。
|
16 |
+
2. **同时开源预训练和对齐模型**:预训练模型是适用开发者的”基座“,而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力,开箱即用,几行代码即可简单的部署。
|
17 |
+
3. **更高效的推理**:为了支持更广大用户的使用,我们本次同时开源了INT8和INT4的量化版本,在几乎没有效果损失的情况下可以很方便的将模型部署在如3090等消费机显卡上。
|
18 |
+
4. **开源免费可商用**:Baichuan-13B不仅对学术研究完全开放,开发者也仅需邮件申请并获得官方商用许可后,即可以免费商用。
|
19 |
|
20 |
## Introduction
|
21 |
+
Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
|
22 |
|
23 |
+
1. **Larger size, more data**: Baichuan-13B further expands the parameter volume to 13 billion based on [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B), and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
|
24 |
+
2. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
|
25 |
+
3. **More efficient inference**: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the 3090 with almost no performance loss.
|
26 |
+
4. **Open-source, free, and commercially usable**: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## How to Get Started with the Model
|
30 |
|
|
|
51 |
model = AutoModel.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
|
52 |
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
|
53 |
messages = []
|
54 |
+
messages.append({"role": "user", "content": "Which moutain is the second highest one in the world?"})
|
55 |
response = model.chat(tokenizer, messages)
|
56 |
print(response)
|
57 |
```
|
|
|
180 |
| **Baichuan-13B-Base** | **41.7** | **61.1** | **59.8** | **59.0** | **56.4** | **55.3** |
|
181 |
| **Baichuan-13B-Chat** | **42.8** | **62.6** | **59.7** | **59.0** | **56.1** | **55.8** |
|
182 |
|
183 |
+
> 说明:CMMLU是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的[评测方案](https://github.com/haonan-li/CMMLU)。
|
184 |
+
|
185 |
## Our Group
|
186 |
![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true)
|