s-JoL commited on
Commit
1b9faad
1 Parent(s): ae9228d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -22
README.md CHANGED
@@ -30,24 +30,24 @@ Baichuan-13B is an open-source, commercially available large-scale language mode
30
 
31
  ## How to Get Started with the Model
32
 
33
- 如下是一个使用baichuan-7B进行1-shot推理的任务,根据作品给出作者名,正确输出为"夜雨寄北->李商隐"
34
  ```python
35
  from transformers import AutoModelForCausalLM, AutoTokenizer
36
 
37
- tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
38
- model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
39
  inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
40
  inputs = inputs.to('cuda:0')
41
  pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
42
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
43
  ```
44
 
45
- The following is a task of performing 1-shot inference using baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
46
  ```python
47
  from transformers import AutoModelForCausalLM, AutoTokenizer
48
 
49
- tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
50
- model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
51
  inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
52
  inputs = inputs.to('cuda:0')
53
  pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
@@ -77,14 +77,10 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
77
  | Baichuan-13B| 25.4 |
78
 
79
  具体参数和见下表
80
- | Hyperparameter | Value |
81
- |----------------|-------|
82
- |n_parameters | xxx |
83
- |n_layers | 40 |
84
- | n_heads | 40 |
85
- | d_model | 5120 |
86
- | vocab size | 64000 |
87
- | sequence length | 4096 |
88
 
89
  The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
90
 
@@ -94,14 +90,10 @@ The overall model is based on Baichuan-7B. In order to achieve better inference
94
  | Baichuan-13B| 25.4 |
95
 
96
  The specific parameters are as follows:
97
- | Hyperparameter | Value |
98
- |----------------|-------|
99
- |n_parameters | xxx |
100
- |n_layers | 40 |
101
- | n_heads | 40 |
102
- | d_model | 5120 |
103
- | vocab size | 64000 |
104
- | sequence length | 4096 |
105
 
106
  ## Uses
107
 
 
30
 
31
  ## How to Get Started with the Model
32
 
33
+ 如下是一个使用Baichuan-13B进行1-shot推理的任务,根据作品给出作者名,正确输出为"夜雨寄北->李商隐"
34
  ```python
35
  from transformers import AutoModelForCausalLM, AutoTokenizer
36
 
37
+ tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
38
+ model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
39
  inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
40
  inputs = inputs.to('cuda:0')
41
  pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
42
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
43
  ```
44
 
45
+ The following is a task of performing 1-shot inference using Baichuan-13B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
46
  ```python
47
  from transformers import AutoModelForCausalLM, AutoTokenizer
48
 
49
+ tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B", trust_remote_code=True)
50
+ model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B", device_map="auto", trust_remote_code=True)
51
  inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
52
  inputs = inputs.to('cuda:0')
53
  pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
 
77
  | Baichuan-13B| 25.4 |
78
 
79
  具体参数和见下表
80
+ | 模型名称 | 隐含层维度 | 层数 | 头数 |词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长度 |
81
+ |-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
82
+ | Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
83
+ | Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096
 
 
 
 
84
 
85
  The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
86
 
 
90
  | Baichuan-13B| 25.4 |
91
 
92
  The specific parameters are as follows:
93
+ | Model Name | Hidden Size | Num Layers | Num Attention Heads |Vocab Size | Total Params | Training Dats(tokens) | Position Embedding | Max Length |
94
+ |-------------------------|-------|------------|------------|-----------------|--------|--------|----------------|---------|
95
+ | Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | [RoPE](https://arxiv.org/abs/2104.09864) | 4,096 |
96
+ | Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4万亿 | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096
 
 
 
 
97
 
98
  ## Uses
99