File size: 2,666 Bytes
d3a9afd 3601ef1 c84585b d3a9afd c84585b d973c1c c84585b d973c1c 382f88e 3601ef1 4104e2c 3601ef1 c84585b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
license: other
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- text-to-image
- diffusers
- controlnet
inference: false
---
# SDXL-controlnet: OpenPose (v2)
These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with OpenPose (v2) conditioning. You can find some example images in the following.
prompt: a ballerina, romantic sunset, 4k photo
![images_0)](./screenshot_ballerina.png)
### Comfy Workflow
![images_0)](./out_ballerina.png)
(Image is from ComfyUI, you can drag and drop in Comfy to use it as workflow)
License: refers to the OpenPose's one.
### Using in 🧨 diffusers
First, install all the libraries:
```bash
pip install -q controlnet_aux transformers accelerate
pip install -q git+https://github.com/huggingface/diffusers
```
Now, we're ready to make Darth Vader dance:
```python
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import OpenposeDetector
from diffusers.utils import load_image
# Compute openpose conditioning image.
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
openpose_image = openpose(image)
# Initialize ControlNet pipeline.
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
# Infer.
prompt = "Darth vader dancing in a desert, high quality"
negative_prompt = "low quality, bad quality"
images = pipe(
prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
num_images_per_prompt=4,
image=openpose_image.resize((1024, 1024)),
generator=torch.manual_seed(97),
).images
images[0]
```
Here are some gemerated examples:
![](./darth_vader_grid.png)
### Training
Use of the training script by HF🤗 [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
#### Training data
This checkpoint was first trained for 15,000 steps on laion 6a resized to a max minimum dimension of 768.
#### Compute
one 1xA100 machine (Thanks a lot HF🤗 to provide the compute!)
#### Batch size
Data parallel with a single gpu batch size of 2 with gradient accumulation 8.
#### Hyper Parameters
Constant learning rate of 8e-5
#### Mixed precision
fp16 |