File size: 1,594 Bytes
4baa9df e2df3d1 6a7428e 0900926 e2df3d1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
language: zh
widget:
- text: "子墨子曰"
---
# Ancient chinese GPT2 model
## Model description
This model is a GPT2 model trained to generate ancient Chinese text, with `bert-base-chinese` as tokenizer.
## Training data
It's trained on the classic Chinese texts fetched from ctext.org. Current training data is really small scale -- all training text data together was 19Mb.
<details>
<summary> "Reading list" of this model </summary>
* 孟子 "mengzi"
* 论语 "analects"
* 商君书 "shang-jun-shu"
* 礼记 "liji"
* 孙子兵法 "art-of-war"
* 墨子 "mozi"
* 庄子 "zhuangzi"
* 道德经 "dao-de-jing"
* 韩非子 "hanfeizi"
* 史记 "shiji"
* 战国策 "zhan-guo-ce"
* 汉书 "han-shu"
* 后汉书 "hou-han-shu"
* 三国志 "sanguozhi"
* 世说新语 "shi-shuo-xin-yu"
* 颜氏家训 "yan-shi-jia-xun"
* 金瓶梅 "jinpingmei"
* 西游记 "xiyouji"
* 红楼梦 "hongloumeng"
</details>
## How to use
You can use the model directly with a pipeline for text generation:
```python
from transformers import pipeline, GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained("binxu/Ziyue-GPT2")
generator = pipeline('text-generation', model=model, tokenizer='bert-base-chinese')
outputs = generator("子墨子曰", max_length=50, num_return_sequences=5, num_beams=10, repetition_penalty=1.5)
[{'generated_text': '子墨子曰 : 吾 未 得 见 之 时 , 知 有 失 得 之 时 , 有 为 之 者 。 氏 , 圣 王 之 时 , 万 乘 之 世 , 圣 人 不 易 之 道 也 。'}]
``` |