--- language: zh widget: - text: "子墨子曰" --- # Ancient chinese GPT2 model ## Model description This model is a GPT2 model trained to generate ancient Chinese text, with `bert-base-chinese` as tokenizer. ## Training data It's trained on the classic Chinese texts fetched from ctext.org. Current training data is really small scale -- all training text data together was 19Mb.
"Reading list" of this model * 孟子 "mengzi" * 论语 "analects" * 商君书 "shang-jun-shu" * 礼记 "liji" * 孙子兵法 "art-of-war" * 墨子 "mozi" * 庄子 "zhuangzi" * 道德经 "dao-de-jing" * 韩非子 "hanfeizi" * 史记 "shiji" * 战国策 "zhan-guo-ce" * 汉书 "han-shu" * 后汉书 "hou-han-shu" * 三国志 "sanguozhi" * 世说新语 "shi-shuo-xin-yu" * 颜氏家训 "yan-shi-jia-xun" * 金瓶梅 "jinpingmei" * 西游记 "xiyouji" * 红楼梦 "hongloumeng"
## How to use You can use the model directly with a pipeline for text generation: ```python from transformers import pipeline, GPT2LMHeadModel model = GPT2LMHeadModel.from_pretrained("binxu/Ziyue-GPT2") generator = pipeline('text-generation', model=model, tokenizer='bert-base-chinese') outputs = generator("子墨子曰", max_length=50, num_return_sequences=5, num_beams=10, repetition_penalty=1.5) [{'generated_text': '子墨子曰 : 吾 未 得 见 之 时 , 知 有 失 得 之 时 , 有 为 之 者 。 氏 , 圣 王 之 时 , 万 乘 之 世 , 圣 人 不 易 之 道 也 。'}] ```