-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2410.04932
-
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Paper • 2410.04932 • Published • 9 -
ControlAR: Controllable Image Generation with Autoregressive Models
Paper • 2410.02705 • Published • 7 -
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
Paper • 2410.13370 • Published • 36 -
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
Paper • 2410.20474 • Published • 13
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 62 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 33 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 14 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 56
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 8 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 15 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 58 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 71
-
DreamGaussian4D: Generative 4D Gaussian Splatting
Paper • 2312.17142 • Published • 18 -
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper • 2410.05167 • Published • 15 -
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Paper • 2410.04932 • Published • 9 -
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
Paper • 2410.11795 • Published • 16