mistralai
/

Ministral-8B-Instruct-2410

Model card Files Files and versions Community

Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM

#10

by Byerose - opened 12 days ago

Byerose

12 days ago

•

edited 12 days ago

Description

I am trying to run the Ministral-8B-Instruct-2410 model on 8 A100 GPUs, and I set max_model_len to 65536. Below is the code snippet I am using:

llm = LLM(
    model=model_id,
    tensor_parallel_size=4,
    max_model_len=args.max_model_token, 
    enforce_eager=True,
    gpu_memory_utilization=0.95,
)

Error Message

When running the above code, I encountered the following error:

ValueError: User-specified max_model_len (65536) is greater than the derived max_model_len (max_position_embeddings=32768 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

Steps to Reproduce

Set up the environment on 8 A100 GPUs.
Use the Ministral-8B-Instruct-2410 model.
Attempt to set max_model_len=65536 and run the code.

Byerose changed discussion title from Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 to Error when setting max_model_len to 65536 for Ministral-8B-Instruct-2410 on A100 | VLLM 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment