description for special tokens
Hi team,
can you please provide some description for the below special token used for sqlcoder model? like what each special token refers to? it will help us to understand what it is.
['<s>', '</s>', '<unk>', 'β<PRE>', 'β<MID>', 'β<SUF>', 'β<EOT>', 'β<PRE>', 'β<MID>', 'β<SUF>', 'β<EOT>']
Hi
@Iamexperimenting
these are the exact same special tokens that codellama uses. The first 2 are the beginning/end of sequence tokens, unk is for unknown tokens, while the rest are for infilling (which we do not support, but kept for backwards compatibility in case you want to test out its code infilling abilities). You can check out their documentation here:
https://huggingface.co/docs/transformers/model_doc/code_llama#transformers.CodeLlamaTokenizer