Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM

#10
by Byerose - opened

Description

I am trying to run the Ministral-8B-Instruct-2410 model on 8 A100 GPUs, and I set max_model_len to 65536. Below is the code snippet I am using:

llm = LLM(
    model=model_id,
    tensor_parallel_size=4,
    max_model_len=args.max_model_token, 
    enforce_eager=True,
    gpu_memory_utilization=0.95,
)

Error Message

When running the above code, I encountered the following error:

ValueError: User-specified max_model_len (65536) is greater than the derived max_model_len (max_position_embeddings=32768 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

Steps to Reproduce

  1. Set up the environment on 8 A100 GPUs.
  2. Use the Ministral-8B-Instruct-2410 model.
  3. Attempt to set max_model_len=65536 and run the code.
Byerose changed discussion title from Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 to Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM

Sign up or log in to comment