Style Aligned Image Generation via Shared Attention
Abstract
Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter (2023)
- InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser (2023)
- Text-Driven Image Editing via Learnable Regions (2023)
- ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors (2023)
- ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper