metadata
language: zh
widget:
- text: 子墨子曰
Ancient chinese GPT2 model
Model description
This model is a GPT2 model trained to generate ancient Chinese text, with bert-base-chinese
as tokenizer.
Training data
It's trained on the classic Chinese texts fetched from ctext.org. Current training data is really small scale -- all training text data together was 19Mb.
"Reading list" of this model
* 孟子 "mengzi"
* 论语 "analects"
* 商君书 "shang-jun-shu"
* 礼记 "liji"
* 孙子兵法 "art-of-war"
* 墨子 "mozi"
* 庄子 "zhuangzi"
* 道德经 "dao-de-jing"
* 韩非子 "hanfeizi"
* 史记 "shiji"
* 战国策 "zhan-guo-ce"
* 汉书 "han-shu"
* 后汉书 "hou-han-shu"
* 三国志 "sanguozhi"
* 世说新语 "shi-shuo-xin-yu"
* 颜氏家训 "yan-shi-jia-xun"
* 金瓶梅 "jinpingmei"
* 西游记 "xiyouji"
* 红楼梦 "hongloumeng"
How to use
You can use the model directly with a pipeline for text generation:
from transformers import pipeline, GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained("binxu/Ziyue-GPT2")
generator = pipeline('text-generation', model=model, tokenizer='bert-base-chinese')
outputs = generator("子墨子曰", max_length=50, num_return_sequences=5, num_beams=10, repetition_penalty=1.5)
[{'generated_text': '子墨子曰 : 吾 未 得 见 之 时 , 知 有 失 得 之 时 , 有 为 之 者 。 氏 , 圣 王 之 时 , 万 乘 之 世 , 圣 人 不 易 之 道 也 。'}]