baichuan-inc
/

Baichuan-13B-Base

@@ -17,7 +17,16 @@ Baichuan-13B 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc
 3. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，支持很简单的部署。
 4. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了 int8 和 int4 的量化版本，在几乎没有效果损失的情况下可以很方便的将模型部署在低显存机器上。
 ## How to Get Started with the Model
@@ -60,34 +69,37 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 <!-- Provide the basic links for the model. -->
-整体模型基于标准的Transformer结构，我们采用了和LLaMA一样的模型设计
-- **Position Embedding**：采用rotary-embedding，是现阶段被大多数模型采用的位置编码方案，具有很好的外推性。
-- **Feedforward Layer**：采用SwiGLU，Feedforward变化为(8/3)倍的隐含层大小，即11008。
-- **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
 具体参数和见下表
 | Hyperparameter | Value |
 |----------------|-------|
-|n_parameters | 7000559616 |
-|n_layers | 32 |
-| n_heads | 32 |
-| d_model | 4096 |
 | vocab size | 64000 |
 | sequence length | 4096 |
-The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:
-- Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
-- Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
-- Layer Normalization: Pre-Normalization based on [RMSNorm](https://arxiv.org/abs/1910.07467).
 The specific parameters are as follows:
 | Hyperparameter | Value |
 |----------------|-------|
-|n_parameters | 7000559616 |
-|n_layers | 32 |
-| n_heads | 32 |
-| d_model | 4096 |
 | vocab size | 64000 |
 | sequence length | 4096 |
@@ -129,6 +141,7 @@ For specific training settings, please refer to [Baichuan-13B](https://github.co
 我们在各个 benchmark 下进行了`5-shot`评测，所采用的方法和 [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) 项目中相同。结果如下：
 ## C-Eval
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |

 3. **同时开源预训练和对齐模型**：预训练模型是适用开发者的”基座“，而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力，开箱即用，支持很简单的部署。
 4. **更高效的推理**：为了支持更广大用户的使用，我们本次同时开源了 int8 和 int4 的量化版本，在几乎没有效果损失的情况下可以很方便的将模型部署在低显存机器上。
+## Introduction
+Baichuan-13B is an open-source, commercially available large-scale language model with 130 billion parameters developed by Baichuan Intelligence following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). It achieves the best performance in standard Chinese and English benchmarks of the same size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
+1. **Open-source, commercially available billion-level Chinese language model**: Baichuan-13B-Base is a free, open-source, commercially available billion-level Chinese pre-training language model. It contains 130 billion parameters, has not undergone any Instruction Tuning or optimization for benchmarks, and is pure and highly customizable. It fills the gap in the lack of over 10 billion high-availability Chinese pre-training large models in the Chinese field.
+2. **Larger size, more data**: On the basis of Baichuan-7B, the parameter volume is further expanded to 130 billion, and 1.4 trillion tokens have been trained on high-quality corpora, making it the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses [ALiBi](https://arxiv.org/abs/2108.12409) position encoding, and has a context window length of 4096.
+3. **Open-source pre-training and alignment models simultaneously**: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we have also released an alignment model (Baichuan-13B-Chat) which has strong dialogue capabilities, is ready to use, and supports simple deployment.
+4. **More efficient inference**: To support a wider range of users, we have also open-sourced the int8 and int4 quantized versions this time. With almost no loss of effect, the model can be easily deployed on low-memory machines.
 ## How to Get Started with the Model
 <!-- Provide the basic links for the model. -->
+整体模型基于Baichuan-7B，为了获得更好的推理性能，Baichuan-13B 使用了 ALiBi 线性偏置技术，相对于 Rotary Embedding 计算量更小，对推理性能有显著提升；与标准的 LLaMA-13B 相比，生成 2000 个 tokens 的平均推理速度 (tokens/s)，实测提升 31.6%：
+| Model       | tokens/s |
+|-------------|----------|
+| LLaMA-13B   | 19.4     |
+| Baichuan-13B| 25.4     |
 具体参数和见下表
 | Hyperparameter | Value |
 |----------------|-------|
+|n_parameters | xxx |
+|n_layers | 40 |
+| n_heads | 40 |
+| d_model | 5120 |
 | vocab size | 64000 |
 | sequence length | 4096 |
+The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
+| Model       | tokens/s |
+|-------------|----------|
+| LLaMA-13B   | 19.4     |
+| Baichuan-13B| 25.4     |
 The specific parameters are as follows:
 | Hyperparameter | Value |
 |----------------|-------|
+|n_parameters | xxx |
+|n_layers | 40 |
+| n_heads | 40 |
+| d_model | 5120 |
 | vocab size | 64000 |
 | sequence length | 4096 |
 我们在各个 benchmark 下进行了`5-shot`评测，所采用的方法和 [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) 项目中相同。结果如下：
+We conducted a `5-shot` evaluation under various benchmarks, using the same method as in the [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) project. The results are as follows:
 ## C-Eval
 | Model 5-shot            | STEM  | Social Sciences | Humanities | Others | Average |