60 languages?
#8
by
conan1024hao
- opened
In the model description, you said flan-t5 was trained on 60 languages (including Japanese, etc.). However, the vocal_size
is only 32138, how could it handle 60 languages?
i think this is impossible under sentencepiece
Same issue , tokenizer doesn't understand Arabic.
Same issue , tokenizer doesn't understand Chinese
neither Vietnamese !
Hello everyone, thanks for the issue and sorry for the confusion
I think that google
has opensourced the English versions only at the moment, we posted a ticket on their repository to track the issue: https://github.com/google-research/t5x/issues/1131
same problem with Korean, Tokenizer cant recognize Korean tokens