-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2401.08541
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 64 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 57 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 16 -
Scalable Pre-training of Large Autoregressive Image Models
Paper • 2401.08541 • Published • 36
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
Paper • 2401.05252 • Published • 47 -
Scalable Pre-training of Large Autoregressive Image Models
Paper • 2401.08541 • Published • 36 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 59
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper • 2312.02087 • Published • 20 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 30 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper • 2312.02432 • Published • 12 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper • 2312.02981 • Published • 8