ct2 converter command raises vocabular size error
#3
by
nazimali
- opened
When I run the ct2 converter command mentioned in the README, it results in a vocabulary size error. Since you ran the converter command on 2023-06-06
, I checked the original model commit history and there hasn't been new commits since 2023-06-06
. Can anyone reproduce this or have any insights as to what causes it? Thanks in advance.
ct2-transformers-converter --model OpenAssistant/falcon-7b-sft-top1-696 --output_dir ~/tmp-ct2fast-falcon-7b-sft-top1-696 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
Traceback:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/llm/bin/ct2-transformers-converter", line 8, in <module>
sys.exit(main())
File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/transformers.py", line 1577, in main
converter.convert_from_args(args)
File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 97, in convert
model_spec.validate()
File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 561, in validate
raise ValueError(
ValueError: Vocabulary has size 65029 but the model expected a vocabulary of size 65040
Library versions:
ctranslate2 3.16.0
hf-hub-ctranslate2 2.12.0
transformers 4.30.2
torch 2.0.1
You have to wait for ctranslate2 = 3.17.0 or build from source. See: https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1275
michaelfeil
changed discussion status to
closed