dialogue-bart-large-chinese
This is a seq2seq model fine-tuned on several Chinese dialogue datasets, from bart-large-chinese.
Datasets
We utilize 4 Chinese dialogue datasets from LUGE
Count | Domain | |
Chinese Persona Chat (CPC) | 23,000 | Open |
LCCC | 11,987,759 | Open |
Emotional STC (ESTC) | 899,207 | Open |
KdConv | 3,000 | Movie, Music, Travel |
Example
from transformers import BertTokenizer, BartForConditionalGeneration
# Note that tokenizer is an object of BertTokenizer, instead of BartTokenizer
tokenizer = BertTokenizer.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")
model = BartForConditionalGeneration.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")
# an example from CPC dev data
dialogue_history = "可以 认识 一下 吗 ? [SEP] 当然 可以 啦 , 你好 。 [SEP] 嘿嘿 你好 , 请问 你 最近 在 忙 什么 呢 ? [SEP] 我 最近 养 了 一只 狗狗 , 我 在 训练 它 呢 。"
input_ids = tokenizer(dialogue_history, return_tensors='pt').input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids))