File size: 1,594 Bytes
4baa9df
 
 
 
 
 
e2df3d1
 
 
 
 
 
6a7428e
 
0900926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2df3d1
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
language: zh
widget: 
- text: "子墨子曰"
---

# Ancient chinese GPT2 model 

## Model description
This model is a GPT2 model trained to generate ancient Chinese text, with `bert-base-chinese` as tokenizer. 

## Training data
It's trained on the classic Chinese texts fetched from ctext.org. Current training data is really small scale -- all training text data together was 19Mb. 

<details>
  <summary> "Reading list" of this model </summary>
  
    * 孟子 "mengzi"
    * 论语 "analects"
    * 商君书 "shang-jun-shu"
    * 礼记 "liji"
    * 孙子兵法 "art-of-war"
    * 墨子 "mozi"
    * 庄子 "zhuangzi"
    * 道德经 "dao-de-jing"
    * 韩非子 "hanfeizi"
    * 史记 "shiji"
    * 战国策 "zhan-guo-ce"
    * 汉书 "han-shu"
    * 后汉书 "hou-han-shu"
    * 三国志 "sanguozhi"
    * 世说新语 "shi-shuo-xin-yu"
    * 颜氏家训 "yan-shi-jia-xun"
    * 金瓶梅 "jinpingmei"
    * 西游记 "xiyouji"
    * 红楼梦 "hongloumeng"
</details>


## How to use
You can use the model directly with a pipeline for text generation:

```python
from transformers import pipeline, GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained("binxu/Ziyue-GPT2")
generator = pipeline('text-generation', model=model, tokenizer='bert-base-chinese')
outputs = generator("子墨子曰", max_length=50, num_return_sequences=5, num_beams=10, repetition_penalty=1.5) 

[{'generated_text': '子墨子曰 : 吾 未 得 见 之 时 , 知 有 失 得 之 时 , 有 为 之 者 。 氏 , 圣 王 之 时 , 万 乘 之 世 , 圣 人 不 易 之 道 也 。'}]
```