|
--- |
|
language: zh |
|
widget: |
|
- text: "子墨子曰" |
|
--- |
|
|
|
# Ancient chinese GPT2 model |
|
|
|
## Model description |
|
This model is a GPT2 model trained to generate ancient Chinese text, with `bert-base-chinese` as tokenizer. |
|
|
|
## Training data |
|
It's trained on the classic Chinese texts fetched from ctext.org. Current training data is really small scale -- all training text data together was 19Mb. |
|
|
|
<details> |
|
<summary> "Reading list" of this model </summary> |
|
|
|
* 孟子 "mengzi" |
|
* 论语 "analects" |
|
* 商君书 "shang-jun-shu" |
|
* 礼记 "liji" |
|
* 孙子兵法 "art-of-war" |
|
* 墨子 "mozi" |
|
* 庄子 "zhuangzi" |
|
* 道德经 "dao-de-jing" |
|
* 韩非子 "hanfeizi" |
|
* 史记 "shiji" |
|
* 战国策 "zhan-guo-ce" |
|
* 汉书 "han-shu" |
|
* 后汉书 "hou-han-shu" |
|
* 三国志 "sanguozhi" |
|
* 世说新语 "shi-shuo-xin-yu" |
|
* 颜氏家训 "yan-shi-jia-xun" |
|
* 金瓶梅 "jinpingmei" |
|
* 西游记 "xiyouji" |
|
* 红楼梦 "hongloumeng" |
|
</details> |
|
|
|
|
|
## How to use |
|
You can use the model directly with a pipeline for text generation: |
|
|
|
```python |
|
from transformers import pipeline, GPT2LMHeadModel |
|
|
|
model = GPT2LMHeadModel.from_pretrained("binxu/Ziyue-GPT2") |
|
generator = pipeline('text-generation', model=model, tokenizer='bert-base-chinese') |
|
outputs = generator("子墨子曰", max_length=50, num_return_sequences=5, num_beams=10, repetition_penalty=1.5) |
|
|
|
[{'generated_text': '子墨子曰 : 吾 未 得 见 之 时 , 知 有 失 得 之 时 , 有 为 之 者 。 氏 , 圣 王 之 时 , 万 乘 之 世 , 圣 人 不 易 之 道 也 。'}] |
|
``` |