-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2409.04410
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 61 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 40 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 32 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 21
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper • 2308.16582 • Published • 10 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 11 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 30 -
Text-to-3D with classifier score distillation
Paper • 2310.19415 • Published • 4