MarianTokenizer can't encode chinese

#11

by thehonestbob - opened May 23, 2023

Discussion

thehonestbob

May 23, 2023

•

edited May 23, 2023

tokenizer = AutoTokenizer.from_pretrained(Helsinki-NLP/opus-mt-en-zh)
text ='你好我的朋友呀，你来自哪里'
result=tokenizer.decode(tokenizer.encode(text))
print(result)
result = ,
what can I do

milimili110

Jun 13, 2023

if you want encode Chinese,you could load opus-mt-zh-en. I am not sure

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment