add_special_tokens=False results in poor generation

#80
by DMaksimov - opened

Hi!

I recently ran an experiment from a model card using a chat template and encountered some issues. In the first attempt, as shown in the attached image, the results were not satisfactory.

image.png

However, when I modified the settings to include the token by setting add_special_tokens=True, the outcome improved significantly:

image.png

Could you please explain the rationale behind using add_special_tokens=False in this example?

Google org
edited Aug 14

Hi @DMaksimov , If we were preparing inputs for standard tasks like text classification, text generation, or translation using pre-trained models without any customized processing, we would typically set
add_special_tokens=True to ensure the input is in the format the model expects.
add_special_tokens=False is used here likely because the we wants to control how the input tokens are processed, either to meet specific model requirements or to handle the special tokens manually. Kindly refer this link for more information . Thank you.

Google org
This comment has been hidden

Based on the testing, can we conclude that this model is sensitive to <bos> token?

While evaluating Gemma-2 model in the evaluation harness lib, I also saw a comment from the output saying that without <bos> token can significantly impact the performance of Gemma models.

Sign up or log in to comment