An error occurred: shape mismatch
Hello,
Would someone be able to help me with this error?
My code prompts for a local image on my system then runs it through the model. All the files are locally stored.
It seems the file is opened and about to be processed, then the error
My assumption is it will analyze the image and provide a text description.
The only issue I notice is, I do have a gpu, but it always uses the CPU. Could that be the cause?
2024-09-11 07:26:36,483 - INFO - Generating description for the image...
2024-09-11 07:26:36,508 - INFO - Image opened successfully. Original size: (512, 512)
2024-09-11 07:26:36,518 - INFO - Image resized to: (448, 448)
2024-09-11 07:26:36,521 - INFO - Model moved to cpu
2024-09-11 07:26:36,521 - INFO - Processing image with the processor...
2024-09-11 07:26:36,542 - INFO - Input tensor 'input_ids' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'attention_mask' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'pixel_values' shape: torch.Size([1024, 1176])
2024-09-11 07:26:36,542 - INFO - Input tensor 'image_grid_thw' shape: torch.Size([1, 3])
2024-09-11 07:26:36,543 - INFO - Generating output from the model...
Setting pad_token_id
to eos_token_id
:151645 for open-end generation.
2024-09-11 07:26:38,410 - ERROR - An error occurred: shape mismatch: value tensor of shape [256, 3584] cannot be broadcast to indexing result of shape [0, 3584]
None
sorry if this is a dupe msg
Thanks,
V
I think your text has no image token for it.
Check if your text has the token vision_start
, image_pad
, vision_end
.
oh, I thought that was part of qwen-vl-utils. I am not sure where I check this but have this:
ry:
from qwen_vl_utils import process_vision_info
image = Image.open(image_path).convert('RGB')
logging.info(f"Image opened successfully. Original size: {image.size}, Mode: {image.mode}")
# Resize the image
image = image.resize((IMAGE_SIZE, IMAGE_SIZE))
logging.info(f"Image resized to: {image.size}")
# Device handling with explicit CUDA check
if torch.cuda.is_available():
device = torch.device("cuda")
logging.info("CUDA-enabled GPU is available. Moving model and inputs to GPU.")
else:
device = torch.device("cpu")
logging.info("No CUDA-enabled GPU found. Using CPU for processing.")
model = model.to(device)
conversation = [
{
"role": "user",
"content": [
{"type": "image", "image": image, "image_id": image_id} if image_id else {"type": "image", "image": image},
{"type": "text", "text": "Describe this image."}
]
}
]
The tokens should be added by processor.apply_chat_template
Thank you!