baichuan-inc
/

Baichuan-13B-Base

@@ -30,24 +30,24 @@ Baichuan-13B is an open-source, commercially available large-scale language mode
 ## How to Get Started with the Model
-如下是一个使用baichuan-7B进行1-shot推理的任务，根据作品给出作者名，正确输出为"夜雨寄北->李商隐"
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
 inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
 inputs = inputs.to('cuda:0')
 pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
 print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 ```
-The following is a task of performing 1-shot inference using baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
 inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
 inputs = inputs.to('cuda:0')
 pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
@@ -77,14 +77,10 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 | Baichuan-13B| 25.4     |
 具体参数和见下表
-| Hyperparameter | Value |
-|----------------|-------|
-|n_parameters | xxx |
-|n_layers | 40 |
-| n_heads | 40 |
-| d_model | 5120 |
-| vocab size | 64000 |
-| sequence length | 4096 |
 The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
@@ -94,14 +90,10 @@ The overall model is based on Baichuan-7B. In order to achieve better inference
 | Baichuan-13B| 25.4     |
 The specific parameters are as follows:
-| Hyperparameter | Value |
-|----------------|-------|
-|n_parameters | xxx |
-|n_layers | 40 |
-| n_heads | 40 |
-| d_model | 5120 |
-| vocab size | 64000 |
-| sequence length | 4096 |
 ## Uses

 ## How to Get Started with the Model
+如下是一个使用Baichuan-13B进行1-shot推理的任务，根据作品给出作者名，正确输出为"夜雨寄北->李商隐"
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
 inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
 inputs = inputs.to('cuda:0')
 pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
 print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 ```
+The following is a task of performing 1-shot inference using Baichuan-13B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
 inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
 inputs = inputs.to('cuda:0')
 pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
 | Baichuan-13B| 25.4     |
 具体参数和见下表
+|     模型名称       | 隐含层维度  | 层数 | 头数 |词表大小 | 总参数量 | 训练数据（tokens） | 位置编码 | 最大长度 |
+|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
+| Baichuan-7B             | 4,096  | 32       | 32   | 64,000    | 7,000,559,616  | 1.2万亿           | [RoPE](https://arxiv.org/abs/2104.09864)    | 4,096    |
+| Baichuan-13B             | 5,120 | 40       | 40  | 64,000    | 13,264,901,120   | 1.4万亿           | [ALiBi](https://arxiv.org/abs/2108.12409)    | 4,096
 The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
 | Baichuan-13B| 25.4     |
 The specific parameters are as follows:
+|     Model Name       | Hidden Size  | Num Layers | Num Attention Heads |Vocab Size | Total Params | Training Dats（tokens） | Position Embedding | Max Length |
+|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
+| Baichuan-7B             | 4,096  | 32       | 32   | 64,000    | 7,000,559,616  | 1.2万亿           | [RoPE](https://arxiv.org/abs/2104.09864)    | 4,096    |
+| Baichuan-13B             | 5,120 | 40       | 40  | 64,000    | 13,264,901,120   | 1.4万亿           | [ALiBi](https://arxiv.org/abs/2108.12409)    | 4,096
 ## Uses