--- license: openrail++ library_name: diffusers pipeline_tag: text-to-image tags: - StableDiffusionXLPipeline - StableDiffusionXLInpaintPipeline - stable-diffusion-xl - stable-diffusion-xl-inpainting - stable-diffusion-xl-diffusers - inpainting --- This repository contains alternative or tuned versions of Stable Diffusion XL Base 1.0 in `.safetensors` format. # Available Models ## sd_xl_base_1.0_fp16_vae.safetensors This file contains the weights of [sd_xl_base_1.0.safetensors](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), merged with the weights of [sdxl_vae.safetensors](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) from MadeByOllin's SDXL FP16 VAE repository. ## sd_xl_base_1.0_inpainting_0.1.safetensors This file contains the weights of `sd_xl_base_1.0_fp16_vae.safetensors` merged with the weights from [diffusers/stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1). # How to Create an SDXL Inpainting Checkpoint from any SDXL Checkpoint Using the `.safetensors` files here, you can calculate an inpainting model using the formula `A + (B - C)`, where: - `A` is `sd_xl_base_1.0_inpainting_0.1.safetensors` - `B` is your fine-tuned checkpoint - `C` is `sd_xl_base_1.0_fp16_vae.safetensors` Using [ENFUGUE](https://github.com/painebenjamin/app.enfugue.ai)'s Web UI: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/XLI5s3fubTup9qhThGs37.png) You must specifically use the two files present in this repository for this to work. The Diffusers team trained XL Inpainting using FP16 XL VAE, so using a different XL base will result in an incorrect delta being applied to the inpainting checkpoint, and the resulting VAE will be nonsensical. # Model Description - Developed by: The Diffusers team - Repackaged by: Benjamin Paine - Model type: Diffusion-based text-to-image generative model - License: CreativeML Open RAIL++-M License - Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). # Uses ## Direct Use The model is intended for research purposes only. Possible research areas and tasks include - Generation of artworks and use in design and other artistic processes. - Applications in educational or creative tools. - Research on generative models. - Safe deployment of models which have the potential to generate harmful content. - Probing and understanding the limitations and biases of generative models. - Excluded uses are described below. ## Out-of-Scope Use The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. # Limitations and Bias ## Limitations - The model does not achieve perfect photorealism - The model cannot render legible text - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” - Faces and people in general may not be generated properly. - The autoencoding part of the model is lossy. - When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version. ## Bias - While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.