Max position embeddings
I think that for CodeLlama 2, "max_position_embeddings": 16384, is the correct line. 4096 is for Llama 2.
This is true, however the fine-tuning data only had items going up to 4k context. I can change it back to 16384, but results may not be great beyond 4k.
Can the fine tuning in 4k affect Codellama rope, when the base model is on 16k?
Anyway, that's what I have in The Bloke GGUF quants of SB 2.2 :
llm_load_print_meta: n_ctx_train = 16384
llm_load_print_meta: n_ctx = 16384 (my pick)
llm_load_print_meta: n_embd = 8192
It's difficult to understand all these changes from the base model to the fine tuning (that part I understand, even if I wonder if the fine tuning totally takes over the initial training of the base model), then to the quant (displaying different values) for us profanes ! ^^
Anyway, thanks for your amazing work, Jon. I'm hooked since your very first version, that I downloaded as soon as it was on HF.
It's probably fine TBH, I was just playing it safe.
I don't think the fine-tuning on 4k would completely degrade the 16k performance, although I would imagine the model will likely be resistant to producing more than 4k tokens, and things like contextual question answering may suffer beyond that just due to lack of fine-tuning data, but I haven't had the time to really analyze it.
Glad you like the models!