What do you think of "List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs"

by Shure-Dev - opened Apr 29

Discussion

Shure-Dev

Apr 29

https://arxiv.org/pdf/2404.16375

I want to know why you do not concat multiple images to make one image and solve with only prompt engineering.

wenhu

TIGER-Lab org May 18

That's the baseline results we compared against across all the benchmarks. Also, concatenating images make co-reference almost impossible. We don't think that's the way to go.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment