Molmo-7B-D-0924 OOM on A100 80GB using Quick Start code

#1
by sasawq21 - opened

Using quick start code from https://huggingface.co/allenai/Molmo-7B-O-0924 with same input image, got OOM using an A100 80GB gpu. Can you provide a test code that can run with A100 80GB? Runnable on 40GB is better, thanks

Run with with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):

Run with with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):

Thanks for the tip! This enabled me to get this running on a 4090 (24GB VRAM) on Windows. I wanted to share my solution for anyone else who might be running into this issue.

processor = AutoProcessor.from_pretrained(
    'allenai/Molmo-7B-D-0924',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

model = AutoModelForCausalLM.from_pretrained(
    'allenai/Molmo-7B-D-0924',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

this enables loading the full model in VRAM and still have plenty left for inference.

prior to calling processor.process I added:

with torch.no_grad():
        with torch.cuda.amp.autocast(dtype=torch.bfloat16):

(the no_grad was a suggestion from o1-preview for memory savings, I'm not sure if its needed but it seems to work!)

@mw44 I forgot to mention the bfloat16 weight loading, thanks for your comment :) no_grad is always nice to have, saved me a ton of VRAM for other transformers
(in this case, the generate_from_batch already has no_grad implemented so you can leave it out, but it's good practice)

Sign up or log in to comment