Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM
#10
by
Byerose
- opened
Description
I am trying to run the Ministral-8B-Instruct-2410
model on 8 A100 GPUs, and I set max_model_len
to 65536. Below is the code snippet I am using:
llm = LLM(
model=model_id,
tensor_parallel_size=4,
max_model_len=args.max_model_token,
enforce_eager=True,
gpu_memory_utilization=0.95,
)
Error Message
When running the above code, I encountered the following error:
ValueError: User-specified max_model_len (65536) is greater than the derived max_model_len (max_position_embeddings=32768 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
Steps to Reproduce
- Set up the environment on 8 A100 GPUs.
- Use the
Ministral-8B-Instruct-2410
model. - Attempt to set
max_model_len=65536
and run the code.
Byerose
changed discussion title from
Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100
to Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM