VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published 9 days ago • 20
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs Paper • 2406.10326 • Published Jun 14 • 1
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts Paper • 2310.10640 • Published Oct 16, 2023 • 2