Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.04325

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 49
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 26
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Paper • 2311.16103 • Published Nov 27, 2023 • 1

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 22
Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10 • 22

Video Generation

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

Video Understanding

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10 • 22
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 42
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31 • 18

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10 • 23
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 55
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7 • 41
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

Video-Gen LLM-based

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published Jun 12 • 14

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 22
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

LanguageBind/MoE-LLaVA-Phi2-2.7B-4e

Text Generation • Updated Feb 1 • 58.7k • 38
LanguageBind/LanguageBind_Video_FT

Zero-Shot Image Classification • Updated Feb 1 • 593k • 4
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10 • 466k • 2.67k
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29 • 12
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs