shibing624
/

t5-chinese-couplet

Text2Text Generation

Text2Text-Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

t5-chinese-couplet / README.md

shibing624's picture

Update README.md

c81c677 almost 2 years ago

|

history blame contribute delete

3.51 kB

	---
	language:
	- zh
	tags:
	- t5
	- pytorch
	- zh
	- Text2Text-Generation
	license: "apache-2.0"
	widget:
	- text: "对联：丹枫江冷人初去"

	---

	# T5 for Chinese Couplet(t5-chinese-couplet) Model
	T5中文对联生成模型

	`t5-chinese-couplet` evaluate couplet test data：

	The overall performance of T5 on couplet test:

	\|prefix\|input_text\|target_text\|pred\|
	\|:-- \|:--- \|:--- \|:-- \|
	\|对联：\|春回大地，对对黄莺鸣暖树\|日照神州，群群紫燕衔新泥\|福至人间,家家紫燕舞和风\|

	在Couplet测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求，而语义对仗工整和平仄合律还不满足。

	T5的网络结构(原生T5)：

	![arch](t5.png)

	## Usage

	本项目开源在文本生成项目：[textgen](https://github.com/shibing624/textgen)，可支持T5模型，通过如下命令调用：

	Install package:
	```shell
	pip install -U textgen
	```

	```python
	from textgen import T5Model
	model = T5Model("t5", "shibing624/t5-chinese-couplet")
	r = model.predict(["对联：丹枫江冷人初去"])
	print(r) # ['白石矶寒客不归']
	```

	## Usage (HuggingFace Transformers)
	Without [textgen](https://github.com/shibing624/textgen), you can use the model like this:

	First, you pass your input through the transformer model, then you get the generated sentence.

	Install package:
	```
	pip install transformers
	```

	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	tokenizer = T5Tokenizer.from_pretrained("shibing624/t5-chinese-couplet")
	model = T5ForConditionalGeneration.from_pretrained("shibing624/t5-chinese-couplet")


	def batch_generate(input_texts, max_length=64):
	features = tokenizer(input_texts, return_tensors='pt')
	outputs = model.generate(input_ids=features['input_ids'],
	attention_mask=features['attention_mask'],
	max_length=max_length)
	return tokenizer.batch_decode(outputs, skip_special_tokens=True)


	r = batch_generate(["对联：丹枫江冷人初去"])
	print(r)
	```

	output:
	```shell
	['白石矶寒客不归']
	```

	模型文件组成：
	```
	t5-chinese-couplet
	├── config.json
	├── model_args.json
	├── pytorch_model.bin
	├── special_tokens_map.json
	├── tokenizer_config.json
	├── spiece.model
	└── vocab.txt
	```


	### 训练数据集
	#### 中文对联数据集

	- 数据：[对联github](https://github.com/wb14123/couplet-dataset)、[清洗过的对联github](https://github.com/v-zich/couplet-clean-dataset)
	- 相关内容
	- [Huggingface](https://huggingface.co/)
	- LangZhou Chinese [MengZi T5 pretrained Model](https://huggingface.co/Langboat/mengzi-t5-base) and [paper](https://arxiv.org/pdf/2110.06696.pdf)
	- [textgen](https://github.com/shibing624/textgen)


	数据格式：

	```text
	head -n 1 couplet_files/couplet/train/in.txt
	晚风摇树树还挺

	head -n 1 couplet_files/couplet/train/out.txt
	晨露润花花更红
	```


	如果需要训练T5模型，请参考[https://github.com/shibing624/textgen/blob/main/docs/%E5%AF%B9%E8%81%94%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B%E5%AF%B9%E6%AF%94.md](https://github.com/shibing624/textgen/blob/main/docs/%E5%AF%B9%E8%81%94%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B%E5%AF%B9%E6%AF%94.md)


	## Citation

	```latex
	@software{textgen,
	author = {Xu Ming},
	title = {textgen: Implementation of Text Generation models},
	year = {2022},
	url = {https://github.com/shibing624/textgen},
	}
	```