CUDA out of memory

#19
by karambos - opened

CUDA out of memory. Tried to allocate 12.78 GiB (GPU 0; 15.73 GiB total capacity; 11.21 GiB already allocated; 2.47 GiB free; 12.19 GiB reserved in total by PyTorch)

I have a cluster of GPU 4 GPU of 16GB,
GPU distribution (after model loading):
0: 9246/16300
1: 9246/16300
2: 9246/16300
3: 8038/16300

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", cache_dir = ".../.cache/huggingface/hub")
model = AutoModelForVision2Seq.from_pretrained("HuggingFaceM4/idefics2-8b", cache_dir = ".../.cache/huggingface/hub", device_map = "auto",, torch_dtype=torch.float16)

I changed this code so weight is distributed on GPU but

generated_ids = model.generate(**inputs, max_new_tokens=60)

GPU distribution (during running this code):
0: 12700/16300
1: 9246/16300
2: 9246/16300
3: 8038/16300

I am getting an error here: CUDA out of memory. Tried to allocate 12.78 GiB (GPU 0; 15.73 GiB total capacity; 11.21 GiB already allocated; 2.47 GiB free; 12.19 GiB reserved in total by PyTorch)

Thank you:)

i tried with AutoProcessor.from_pretrained with do_image_splitting=False
still getting same error

This comment has been hidden
karambos changed discussion status to closed
karambos changed discussion status to open

i try to implement a chat model but I am getting errors at this line

inputs = {k: v.to("cuda") for k, v in inputs.items()}

how can I distribute the data on different GPUs or generate low-bit output?

inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")

hey @VictorSanh any suggestion please

Error Solved!
do_image_splitting=False parameter forgot but after this parameter it is working well

karambos changed discussion status to closed

Sign up or log in to comment