|
--- |
|
license: openrail++ |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
tags: |
|
- StableDiffusionXLPipeline |
|
- StableDiffusionXLInpaintPipeline |
|
- stable-diffusion-xl |
|
- stable-diffusion-xl-inpainting |
|
- stable-diffusion-xl-diffusers |
|
- inpainting |
|
--- |
|
|
|
This repository contains alternative or tuned versions of Stable Diffusion XL Base 1.0 in `.safetensors` format. |
|
|
|
# Available Models |
|
## sd_xl_base_1.0_fp16_vae.safetensors |
|
|
|
This file contains the weights of [sd_xl_base_1.0.safetensors](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), merged with the weights of [sdxl_vae.safetensors](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) from MadeByOllin's SDXL FP16 VAE repository. |
|
|
|
## sd_xl_base_1.0_inpainting_0.1.safetensors |
|
|
|
This file contains the weights of `sd_xl_base_1.0_fp16_vae.safetensors` merged with the weights from [diffusers/stable-diffusion-xl-1.0-inpainting-0.1](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1). |
|
|
|
# How to Create an SDXL Inpainting Checkpoint from any SDXL Checkpoint |
|
|
|
Using the `.safetensors` files here, you can calculate an inpainting model using the formula `A + (B - C)`, where: |
|
- `A` is `sd_xl_base_1.0_inpainting_0.1.safetensors` |
|
- `B` is your fine-tuned checkpoint |
|
- `C` is `sd_xl_base_1.0_fp16_vae.safetensors` |
|
|
|
Using [ENFUGUE](https://github.com/painebenjamin/app.enfugue.ai)'s Web UI: |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/XLI5s3fubTup9qhThGs37.png) |
|
|
|
You must specifically use the two files present in this repository for this to work. The Diffusers team trained XL Inpainting using FP16 XL VAE, so using a different XL base will result in an incorrect delta being applied to the inpainting checkpoint, and the resulting VAE will be nonsensical. |
|
|
|
# Model Description |
|
- Developed by: The Diffusers team |
|
- Repackaged by: Benjamin Paine |
|
- Model type: Diffusion-based text-to-image generative model |
|
- License: CreativeML Open RAIL++-M License |
|
- Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). |
|
|
|
# Uses |
|
## Direct Use |
|
|
|
The model is intended for research purposes only. Possible research areas and tasks include |
|
|
|
- Generation of artworks and use in design and other artistic processes. |
|
- Applications in educational or creative tools. |
|
- Research on generative models. |
|
- Safe deployment of models which have the potential to generate harmful content. |
|
- Probing and understanding the limitations and biases of generative models. |
|
- Excluded uses are described below. |
|
|
|
## Out-of-Scope Use |
|
|
|
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. |
|
|
|
# Limitations and Bias |
|
## Limitations |
|
- The model does not achieve perfect photorealism |
|
- The model cannot render legible text |
|
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere” |
|
- Faces and people in general may not be generated properly. |
|
- The autoencoding part of the model is lossy. |
|
- When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version. |
|
## Bias |
|
- While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. |