YanshekWoo's picture
Update README.md
e000310
|
raw
history blame
1.58 kB

dialogue-bart-large-chinese

This is a seq2seq model fine-tuned on several Chinese dialogue datasets, from bart-large-chinese.

Datasets

We utilize 4 Chinese dialogue datasets from LUGE

Count Domain
Chinese Persona Chat (CPC) 23,000 Open
LCCC 11,987,759 Open
Emotional STC (ESTC) 899,207 Open
KdConv 3,000 Movie, Music, Travel

Example

from transformers import BertTokenizer, BartForConditionalGeneration

# Note that tokenizer is an object of BertTokenizer, instead of BartTokenizer
tokenizer = BertTokenizer.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")
model = BartForConditionalGeneration.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")

# an example from CPC dev data
history = ["可以 认识 一下 吗 ?", "当然 可以 啦 , 你好 。", "嘿嘿 你好 , 请问 你 最近 在 忙 什么 呢 ?", "我 最近 养 了 一只 狗狗 , 我 在 训练 它 呢 。"]
history_str = "历史:" + tokenizer.sep_token.join(history)
input_ids = tokenizer(history_str, return_tensors='pt').input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids))