Update max_position_embeddings

#2
by FlorianJc - opened

If the model is really able to handle 128k context size, you should set max_position_embeddings and max_length to 131072.

If not so, vLLM reject max_model_len>8192

Ghost X org

This is the 8k context version, and we use LongLM to increase the context, you can refer to it here.

Note: reusing Llama model source code will lose some extension code, but don't worry, it still works fine when used with LongLM.

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    trust_remote_code=True,
)
SelfExtend.apply(
    model,
    group_size=16,
    window_size=512,
    enable_flash_attention=True,
    flash_attention_impl="flash_attn",
)
model.generation_config.max_length = 123392

Sign up or log in to comment