-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 40 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 29
Collections
Discover the best community collections!
Collections including paper arxiv:2402.12226
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 28 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 65
-
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20 -
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Paper • 2402.03162 • Published • 17 -
Rolling Diffusion Models
Paper • 2402.09470 • Published • 9 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 40
-
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 30 -
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Paper • 2401.08740 • Published • 12 -
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper • 2401.10061 • Published • 28 -
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices
Paper • 2311.16567 • Published • 22
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper • 2312.02087 • Published • 20 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 30 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper • 2312.02432 • Published • 12 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper • 2312.02981 • Published • 8
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 494k • 2.67k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 29
-
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
Personality Traits in Large Language Models
Paper • 2307.00184 • Published • 20 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 14
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 9 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 84 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6