batch image caption
hi,is there any method to generate batch image caption?
Hi, you can put a list of images and a list of text (prompts), it should work. If there is something wrong, let me know please.
here is my code:
# func run
def run_example(images):
prompt = "<grounding> Describe this image in detail:"
batch_ppt = [prompt] * len(images)
# inputs = processor(text=prompt, images=image, return_tensors="pt")
inputs = processor(text=batch_ppt, images=images, return_tensors="pt")
generated_ids = model.generate(
pixel_values=inputs["pixel_values"],
input_ids=inputs["input_ids"][:, :-1],
attention_mask=inputs["attention_mask"][:, :-1],
img_features=None,
img_attn_mask=inputs["img_attn_mask"][:, :-1],
use_cache=True,
max_new_tokens=128,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
_processed_text = processor.post_process_generation(generated_text, cleanup_and_extract=False)
processed_text, entities = processor.post_process_generation(generated_text)
print(processed_text)
# print(entities)
# print(_processed_text)
img_path = '/prompts_data/snowman.jpg'
images = [Image.open(img_path)] * 3
run_example(images)
then i get the following error info:
Traceback (most recent call last):
File "/cfs-nj-gameai/joelrliu/prompts_data/ko.py", line 39, in <module>
generated_ids = model.generate(
File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1739, in generate
output = self.text_model.generate(
File "/usr/miniconda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate
return self.greedy_search(
File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2444, in greedy_search
outputs = self(
File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1362, in forward
outputs = self.model(
File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1068, in forward
hidden_states = self.forward_embedding(
File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1010, in forward_embedding
inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features
RuntimeError: shape mismatch: value tensor of shape [3, 64, 2048] cannot be broadcast to indexing result of shape [192, 2048]
it seems that the shape is mismatch, so I try use the reshape
to fix the code as follow:
inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features.reshape[-1, img_features.shape[-1]]
the error has gone, however, get unexcpted prompt result...
Hello again! I made a small change, and it should be able to run with batch examples now.
[Note!] The current code snippet (the [:, :-1]
part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.
inputs["input_ids"][:, :-1]
There is an on going effort to port Kosmos-2
directly into transformers
. This repository (remote code) might need some more bug fixes later, including some breaking changes.
Hello again! I made a small change, and it should be able to run with batch examples now.
[Note!] The current code snippet (the
[:, :-1]
part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.
inputs["input_ids"][:, :-1]
There is an on going effort to port
Kosmos-2
directly intotransformers
. This repository (remote code) might need some more bug fixes later, including some breaking changes.
thanks for your respone! use view
is work~!