Vocabulary size mismatch between pre-trained and finetuned versions
#12
by
singhsonali0503
- opened
Hello.
This is probably a naive question, but I am curious.
I learnt from the config.json file of the pretrained versions of TinyLlama that the models were trained with a vocab size of 32000.
However, I see that the finetuned-for-chat versions of TinyLlama are trained with a vocabulary size of 32001.
Is there a specific reason for it?
Further, I was using the pretrained version in my code which expects a vocab size of 32000.
The finetuned chat version, therefore, no longer works. Any suggestions on how I can get the chat version of TinyLlama to work?
Thanks in advance.