batch image caption

by joelr23 - opened Aug 30, 2023

Discussion

joelr23

Aug 30, 2023

hi，is there any method to generate batch image caption?

joelr23 changed discussion title from batch prompt to batch image caption Aug 30, 2023

ydshieh

Owner Aug 30, 2023

Hi, you can put a list of images and a list of text (prompts), it should work. If there is something wrong, let me know please.

joelr23

Sep 1, 2023

•

edited Sep 1, 2023

here is my code:


# func run
def run_example(images):
    prompt = "<grounding> Describe this image in detail:"
    batch_ppt = [prompt] * len(images)
    # inputs = processor(text=prompt, images=image, return_tensors="pt")
    inputs = processor(text=batch_ppt, images=images, return_tensors="pt")
    generated_ids = model.generate(
        pixel_values=inputs["pixel_values"],
        input_ids=inputs["input_ids"][:, :-1],
        attention_mask=inputs["attention_mask"][:, :-1],
        img_features=None,
        img_attn_mask=inputs["img_attn_mask"][:, :-1],
        use_cache=True,
        max_new_tokens=128,
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    _processed_text = processor.post_process_generation(generated_text, cleanup_and_extract=False)
    processed_text, entities = processor.post_process_generation(generated_text)
    print(processed_text)
    # print(entities)
    # print(_processed_text)

img_path = '/prompts_data/snowman.jpg'
images = [Image.open(img_path)] * 3
run_example(images)

then i get the following error info:

Traceback (most recent call last):
  File "/cfs-nj-gameai/joelrliu/prompts_data/ko.py", line 39, in <module>
    generated_ids = model.generate(
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1739, in generate
    output = self.text_model.generate(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate
    return self.greedy_search(
  File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2444, in greedy_search
    outputs = self(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1362, in forward
    outputs = self.model(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1068, in forward
    hidden_states = self.forward_embedding(
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1010, in forward_embedding
    inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features
RuntimeError: shape mismatch: value tensor of shape [3, 64, 2048] cannot be broadcast to indexing result of shape [192, 2048]

it seems that the shape is mismatch, so I try use the reshape to fix the code as follow:

    inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features.reshape[-1, img_features.shape[-1]]

the error has gone, however, get unexcpted prompt result...

Ashwath-Shetty

Sep 1, 2023

This comment has been hidden

ydshieh

Owner Sep 1, 2023

Thanks for opening this issue @joelr23 . There is indeed some problems when using batch. I will take a deeper look.

ydshieh

Owner Sep 1, 2023

•

edited Sep 2, 2023

Hello again! I made a small change, and it should be able to run with batch examples now.

[Note!] The current code snippet (the [:, :-1] part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.

inputs["input_ids"][:, :-1]

There is an on going effort to port Kosmos-2 directly into transformers. This repository (remote code) might need some more bug fixes later, including some breaking changes.

joelr23

Sep 4, 2023

•

edited Sep 4, 2023

Hello again! I made a small change, and it should be able to run with batch examples now.

[Note!] The current code snippet (the [:, :-1] part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.
inputs["input_ids"][:, :-1]
There is an on going effort to port Kosmos-2 directly into transformers. This repository (remote code) might need some more bug fixes later, including some breaking changes.

thanks for your respone! use view is work~!

joelr23 changed discussion status to closed Sep 4, 2023

joelr23 changed discussion status to open Sep 4, 2023

joelr23 changed discussion status to closed Sep 4, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment