MarianTokenizer can't encode chinese
#11
by
thehonestbob
- opened
tokenizer = AutoTokenizer.from_pretrained(Helsinki-NLP/opus-mt-en-zh)
text ='你好我的朋友呀,你来自哪里'
result=tokenizer.decode(tokenizer.encode(text))
print(result)
result = ,
what can I do
if you want encode Chinese,you could load opus-mt-zh-en. I am not sure