-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 30 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 20 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2401.01952
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 45 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 37 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 29
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 17 -
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Paper • 2401.00909 • Published • 9 -
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image
Paper • 2401.01117 • Published • 8 -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Paper • 2401.01173 • Published • 11
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 69 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 29 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 64 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 46
-
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Paper • 2208.12242 • Published • 10 -
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper • 2308.06721 • Published • 29 -
h94/IP-Adapter-FaceID
Text-to-Image • Updated • 502k • 1.58k -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 46