Update README.md
Browse files
README.md
CHANGED
@@ -30,24 +30,24 @@ Baichuan-13B is an open-source, commercially available large-scale language mode
|
|
30 |
|
31 |
## How to Get Started with the Model
|
32 |
|
33 |
-
如下是一个使用
|
34 |
```python
|
35 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
36 |
|
37 |
-
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/
|
38 |
-
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/
|
39 |
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
|
40 |
inputs = inputs.to('cuda:0')
|
41 |
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
|
42 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
43 |
```
|
44 |
|
45 |
-
The following is a task of performing 1-shot inference using
|
46 |
```python
|
47 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
48 |
|
49 |
-
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/
|
50 |
-
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/
|
51 |
inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
|
52 |
inputs = inputs.to('cuda:0')
|
53 |
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
|
@@ -77,14 +77,10 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
77 |
| Baichuan-13B| 25.4 |
|
78 |
|
79 |
具体参数和见下表
|
80 |
-
|
|
81 |
-
|
82 |
-
|
|
83 |
-
|
|
84 |
-
| n_heads | 40 |
|
85 |
-
| d_model | 5120 |
|
86 |
-
| vocab size | 64000 |
|
87 |
-
| sequence length | 4096 |
|
88 |
|
89 |
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
|
90 |
|
@@ -94,14 +90,10 @@ The overall model is based on Baichuan-7B. In order to achieve better inference
|
|
94 |
| Baichuan-13B| 25.4 |
|
95 |
|
96 |
The specific parameters are as follows:
|
97 |
-
|
|
98 |
-
|
99 |
-
|
|
100 |
-
|
|
101 |
-
| n_heads | 40 |
|
102 |
-
| d_model | 5120 |
|
103 |
-
| vocab size | 64000 |
|
104 |
-
| sequence length | 4096 |
|
105 |
|
106 |
## Uses
|
107 |
|
|
|
30 |
|
31 |
## How to Get Started with the Model
|
32 |
|
33 |
+
如下是一个使用Baichuan-13B进行1-shot推理的任务,根据作品给出作者名,正确输出为"夜雨寄北->李商隐"
|
34 |
```python
|
35 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
36 |
|
37 |
+
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
|
38 |
+
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
|
39 |
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
|
40 |
inputs = inputs.to('cuda:0')
|
41 |
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
|
42 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
43 |
```
|
44 |
|
45 |
+
The following is a task of performing 1-shot inference using Baichuan-13B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
|
46 |
```python
|
47 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
48 |
|
49 |
+
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
|
50 |
+
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
|
51 |
inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
|
52 |
inputs = inputs.to('cuda:0')
|
53 |
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
|
|
|
77 |
| Baichuan-13B| 25.4 |
|
78 |
|
79 |
具体参数和见下表
|
80 |
+
| 模型名称 | 隐含层维度 | 层数 | 头数 |词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长度 |
|
81 |
+
|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
|
82 |
+
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
|
83 |
+
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096
|
|
|
|
|
|
|
|
|
84 |
|
85 |
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
|
86 |
|
|
|
90 |
| Baichuan-13B| 25.4 |
|
91 |
|
92 |
The specific parameters are as follows:
|
93 |
+
| Model Name | Hidden Size | Num Layers | Num Attention Heads |Vocab Size | Total Params | Training Dats(tokens) | Position Embedding | Max Length |
|
94 |
+
|-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
|
95 |
+
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
|
96 |
+
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096
|
|
|
|
|
|
|
|
|
97 |
|
98 |
## Uses
|
99 |
|