ROPE SETTINGS IS WRONG
in the config rope settings are wrong and from the older model. new model has a larger max token which is not correctly set. additionally, there is this error which asks for only two values in the rope config:
ValueError: rope_scaling
must be a dictionary with two fields, type
and factor
, got {'factor': 8.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
in the config rope settings are wrong and from the older model. new model has a larger max token which is not correctly set. additionally, there is this error which asks for only two values in the rope config:
ValueError:rope_scaling
must be a dictionary with two fields,type
andfactor
, got {'factor': 8.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Thank you for your input. Have you found the correct way to fix the config?
@RaccoonOnion I think config must be changed. but I'm not sure what settings should be used, I just know original_max_position_embeddings': 8192 is wrong and it must be 131072 or 128000, because the new models have extended context to 128k.
Not with unsloth, but I've run into the same issue with tryign to convert the new 3.1 model to a GPTQ.
I was able to pop into the downloaded model config.json file. Manually edit the settings and run with success. Same might apply here?
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
],
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"type": "linear"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.42.3",
"use_cache": true,
"vocab_size": 128256
}