RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization Paper • 2403.00483 • Published Mar 1 • 12
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 28
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21 • 21
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7 • 40
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 42
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 13
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Paper • 2403.09977 • Published Mar 15 • 9
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm Paper • 2403.11781 • Published Mar 18 • 17
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18 • 16
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 64
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 24
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 47
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 19
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 11
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 27
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published May 14 • 19
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published May 16 • 16
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published May 23 • 12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Paper • 2406.08392 • Published Jun 12 • 18
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14 • 21
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15 • 65
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24 • 54
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51
DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents Paper • 2407.03300 • Published Jul 3 • 11
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Paper • 2407.12594 • Published Jul 17 • 19
Artist: Aesthetically Controllable Text-Driven Stylization without Training Paper • 2407.15842 • Published Jul 22 • 13
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person Paper • 2407.16224 • Published Jul 23 • 24
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model Paper • 2407.16982 • Published Jul 24 • 40
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Paper • 2407.17952 • Published Jul 25 • 29
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Paper • 2408.04594 • Published Aug 8 • 14