--- language: - zh tags: - SongNet - pytorch - zh - Text2Text-Generation license: "apache-2.0" widget: - text: "丹枫江冷人初去" --- # SongNet for Chinese Couplet(songnet-base-chinese-couplet) Model SongNet中文对联仿写模型 `songnet-base-chinese-couplet` evaluate couplet test data: The overall performance of SongNet on couplet **test**: |input_text|predict| |:--- |:--- | |一句相思吟岁月,千杯美酒醉风情|一生只剩诗和酒,满腹无关雪与梅| 在Couplet测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求,针对性的SongNet网络结构,在语义对仗工整和平仄合律上的效果明显优于T5和GPT2等模型。 SongNet的网络结构: ![arch](songnet-network.png) ## Usage 本项目开源在文本生成项目:[textgen](https://github.com/shibing624/textgen),可支持SongNet模型,通过如下命令调用: Install package: ```shell pip install -U textgen ``` ```python import sys sys.path.append('..') from textgen.language_modeling import SongNetModel model = SongNetModel(model_type='songnet', model_name='shibing624/songnet-base-chinese-couplet') sentences = [ "严蕊如梦令道是梨花不是。道是杏花不是。白白与红红,别是东风情味。曾记。曾记。人在武陵微醉。", "一句相思吟岁月千杯美酒醉风情", "几树梅花数竿竹一潭秋水半屏山" "未舍东江开口咏且施妙手点睛来", "一去二三里烟村四五家", ] print("inputs:", sentences) print("outputs:", model.generate(sentences)) sentences = [ "一句____月千杯美酒__情", "一去二三里烟村__家亭台__座八__枝花", ] print("inputs:", sentences) print("outputs:", model.fill_mask(sentences)) ``` output: ```shell inputs: ['严蕊如梦令道是梨花不是。道是杏花不是。白白与红红,别是东风情味。曾记。曾记。人在武陵微醉。', '一句相思吟岁月千杯美酒醉风情', '几树梅花数竿竹一潭秋水半屏山未舍东江开口咏且施妙手点睛来', '一去二三里烟村四五家'] outputs: ['盛世欣开新气象春联喜绘大文章春天铺锦笺,宏图更写好山山新篇章新篇章神州高唱好年华', '一曲琴音添雅韵几回酒醉解愁思', '三分天下隆中对四面八方九派江山笔底留', '春深花已老夜静露方浓'] inputs: ['一句____月千杯美酒__情', '一去二三里烟村__家亭台__座八__枝花'] outputs: ['一句佳诗吟盛月千杯美酒祝春情', '一去二三里烟村百二家亭台十二座八里一枝花'] ``` 模型文件组成: ``` songnet-base-chinese-couplet ├── pytorch_model.bin └── vocab.txt ``` ### 训练数据集 #### 中文对联数据集 - 数据:[对联github](https://github.com/wb14123/couplet-dataset)、[清洗过的对联github](https://github.com/v-zich/couplet-clean-dataset) - 相关内容 - [Huggingface](https://huggingface.co/) - [SongNet paper](https://aclanthology.org/2020.acl-main.68/) - [textgen](https://github.com/shibing624/textgen) 数据格式: ```text head -n 1 couplet_files/couplet/train/in.txt 晚 风 摇 树 树 还 挺 head -n 1 couplet_files/couplet/train/out.txt 晨 露 润 花 花 更 红 ``` 如果需要训练SongNet模型,请参考[https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py](https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py) ## Citation ```latex @software{textgen, author = {Xu Ming}, title = {textgen: Implementation of Text Generation models}, year = {2022}, url = {https://github.com/shibing624/textgen}, } ```