Upload tokenizer.json
#16
by
jonatanklosko
- opened
Generated with:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
Looks good to me! cc @sanchit-gandhi
As discussed with @ArthurZ on the PR the fast tokenizer can always be loaded from the slow one: https://github.com/huggingface/transformers/pull/27338/files#r1384935617
So there's no issue with not having the tokenizer.json
. Happy to merge this PR to improve clarity for the Hub weights however
@sanchit-gandhi
yeah, the thing is that the Rust huggingface/tokenizers
can only load tokenizer.json
. In the Elixir ecosystem we have bindings to huggingface/tokenizers
and so rely solely on fast tokenizers :)
Thanks for the explanation! Makes sense - let's merge this one then @ArthurZ @patrickvonplaten
patrickvonplaten
changed pull request status to
merged