SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models Paper • 2408.12114 • Published Aug 22 • 11
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 53
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Paper • 2306.14435 • Published Jun 26, 2023 • 20