CUDA out of memory
CUDA out of memory. Tried to allocate 12.78 GiB (GPU 0; 15.73 GiB total capacity; 11.21 GiB already allocated; 2.47 GiB free; 12.19 GiB reserved in total by PyTorch)
I have a cluster of GPU 4 GPU of 16GB,
GPU distribution (after model loading):
0: 9246/16300
1: 9246/16300
2: 9246/16300
3: 8038/16300
processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", cache_dir = ".../.cache/huggingface/hub")
model = AutoModelForVision2Seq.from_pretrained("HuggingFaceM4/idefics2-8b", cache_dir = ".../.cache/huggingface/hub", device_map = "auto",, torch_dtype=torch.float16)
I changed this code so weight is distributed on GPU but
generated_ids = model.generate(**inputs, max_new_tokens=60)
GPU distribution (during running this code):
0: 12700/16300
1: 9246/16300
2: 9246/16300
3: 8038/16300
I am getting an error here: CUDA out of memory. Tried to allocate 12.78 GiB (GPU 0; 15.73 GiB total capacity; 11.21 GiB already allocated; 2.47 GiB free; 12.19 GiB reserved in total by PyTorch)
Thank you:)
i tried with AutoProcessor.from_pretrained with do_image_splitting=False
still getting same error
i try to implement a chat model but I am getting errors at this line
inputs = {k: v.to("cuda") for k, v in inputs.items()}
how can I distribute the data on different GPUs or generate low-bit output?
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")
hey @VictorSanh any suggestion please
Error Solved!
do_image_splitting=False parameter forgot but after this parameter it is working well