shibing624 commited on
Commit
869843b
1 Parent(s): 30b7beb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -1
README.md CHANGED
@@ -1,3 +1,110 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ tags:
5
+ - songnet
6
+ - pytorch
7
+ - zh
8
+ - Text2Text-Generation
9
+ license: "apache-2.0"
10
+ widget:
11
+ - text: "丹枫江冷人初去"
12
+
13
  ---
14
+
15
+ # SongNet for Chinese Couplet(songnet-base-chinese-couplet) Model
16
+ SongNet中文对联生成模型
17
+
18
+ `songnet-base-chinese-couplet` evaluate couplet test data:
19
+
20
+ The overall performance of T5 on couplet **test**:
21
+
22
+ |input_text|target_text|pred|
23
+ |:--- |:--- |:-- |
24
+ |春回大地,对对黄莺鸣暖树|日照神州,群群紫燕衔新泥|福至人间,家家紫燕舞和风|
25
+
26
+ 在Couplet测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求,针对性的SongNet网络结构,在语义对仗工整和平仄合律上的效果明显优于T5和GPT2等模型。
27
+
28
+ SongNet的网络结构:
29
+
30
+ ![arch](songnet-network.png)
31
+
32
+ ## Usage
33
+
34
+ 本项目开源在文本生成项目:[textgen](https://github.com/shibing624/textgen),可支持SongNet模型,通过如下命令调用:
35
+
36
+ Install package:
37
+ ```shell
38
+ pip install -U textgen
39
+ ```
40
+
41
+ ```python
42
+ import sys
43
+
44
+ sys.path.append('..')
45
+ from textgen.language_modeling import SongNetModel
46
+
47
+
48
+ model = SongNetModel(model_type='songnet', model_name='songnet-base-chinese-couplet')
49
+ sentences = [
50
+ "严蕊<s1>如梦令<s2>道是梨花不是。</s>道是杏花不是。</s>白白与红红,别是东风情味。</s>曾记。</s>曾记。</s>人在武陵微醉。",
51
+ "<s1><s2>一句相思吟岁月</s>千杯美酒醉风情",
52
+ "<s1><s2>几树梅花数竿竹</s>一潭秋水半屏山"
53
+ "<s1><s2>未舍东江开口咏</s>且施妙手点睛来",
54
+ "<s1><s2>一去二三里</s>烟村四五家",
55
+ ]
56
+ print("inputs:", sentences)
57
+ print("outputs:", model.generate(sentences))
58
+ sentences = [
59
+ "<s1><s2>一句____月</s>千杯美酒__情",
60
+ "<s1><s2>一去二三里</s>烟村__家</s>亭台__座</s>八__枝花",
61
+ ]
62
+ print("inputs:", sentences)
63
+ print("outputs:", model.fill_mask(sentences))
64
+
65
+ ```
66
+
67
+
68
+ 模型文件组成:
69
+ ```
70
+ t5-chinese-couplet
71
+ ├── pytorch_model.bin
72
+ └── vocab.txt
73
+ ```
74
+
75
+
76
+ ### 训练数据集
77
+ #### 中文对联数据集
78
+
79
+ - 数据:[对联github](https://github.com/wb14123/couplet-dataset)、[清洗过的对联github](https://github.com/v-zich/couplet-clean-dataset)
80
+ - 相关内容
81
+ - [Huggingface](https://huggingface.co/)
82
+ - [SongNet paper](https://aclanthology.org/2020.acl-main.68/)
83
+ - [textgen](https://github.com/shibing624/textgen)
84
+
85
+
86
+ 数据格式:
87
+
88
+ ```text
89
+ head -n 1 couplet_files/couplet/train/in.txt
90
+ 晚 风 摇 树 树 还 挺
91
+
92
+ head -n 1 couplet_files/couplet/train/out.txt
93
+ 晨 露 润 花 花 更 红
94
+ ```
95
+
96
+
97
+ 如果需要训练SongNet模型,请参考[https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py](https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py)
98
+
99
+
100
+ ## Citation
101
+
102
+ ```latex
103
+ @software{textgen,
104
+ author = {Xu Ming},
105
+ title = {textgen: Implementation of Text Generation models},
106
+ year = {2022},
107
+ url = {https://github.com/shibing624/textgen},
108
+ }
109
+ ```
110
+