RoPE scaling and max_position_embeddings
Hello,
In config.json, a linear rope_scaling of 8 is defined, and max_position_embeddings has been increased to 32768.
However, the huggingface Llama2 doc specifies that when a rope scaling strategy is used, max_position_embeddings should not be updated.
https://huggingface.co/docs/transformers/main/model_doc/llama2#transformers.LlamaConfig.rope_scaling
Wouldn't the existing config result in the RoPE scaling being applied twice (especially when setting trust_remote_code=False)?
This should be fixed
Hi @ag0 , thanks for bringing this up! I think this only affects the NTK scaling but not the linear scaling (which is what is adopted here): https://github.com/huggingface/transformers/blob/fdd81aea12f06e24ab5cf5ba3c7316df3ab1a779/src/transformers/models/llama/modeling_llama.py#L135-L144
Let us know what you think!:)