patrickvonplaten williamberman commited on
Commit
08db982
1 Parent(s): ea8efb3

README update (#1)

Browse files

- README fixes (915eb173467ffdd23a48b10b729174da9a3743a7)


Co-authored-by: Will Berman <[email protected]>

README.md CHANGED
@@ -18,358 +18,48 @@ Controlnet's auxiliary models are trained with stable diffusion 1.5. Experimenta
18
  The auxiliary conditioning is passed directly to the diffusers pipeline. If you want to process an image to create the auxiliary conditioning, external dependencies are required.
19
 
20
  Some of the additional conditionings can be extracted from images via additional models. We extracted these
21
- additional models from the original controlnet repo into a separate package that can be found on [github](https://github.com/patrickvonplaten/human_pose.git).
22
 
23
- ## Canny edge detection
24
-
25
- Install opencv
26
-
27
- ```sh
28
- $ pip install opencv-contrib-python
29
- ```
30
-
31
- ```python
32
- import cv2
33
- from PIL import Image
34
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
35
- import torch
36
- import numpy as np
37
-
38
- image = Image.open('images/bird.png')
39
- image = np.array(image)
40
-
41
- low_threshold = 100
42
- high_threshold = 200
43
-
44
- image = cv2.Canny(image, low_threshold, high_threshold)
45
- image = image[:, :, None]
46
- image = np.concatenate([image, image, image], axis=2)
47
- image = Image.fromarray(image)
48
-
49
- controlnet = ControlNetModel.from_pretrained(
50
- "fusing/stable-diffusion-v1-5-controlnet-canny",
51
- )
52
-
53
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
54
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
55
- )
56
- pipe.to('cuda')
57
-
58
- image = pipe("bird", image).images[0]
59
-
60
- image.save('images/bird_canny_out.png')
61
- ```
62
-
63
- ![bird](./images/bird.png)
64
-
65
- ![bird_canny](./images/bird_canny.png)
66
-
67
- ![bird_canny_out](./images/bird_canny_out.png)
68
-
69
- ## M-LSD Straight line detection
70
-
71
- Install the additional controlnet models package.
72
-
73
- ```sh
74
- $ pip install git+https://github.com/patrickvonplaten/human_pose.git
75
- ```
76
-
77
- ```py
78
- from PIL import Image
79
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
80
- import torch
81
- from human_pose import MLSDdetector
82
-
83
- mlsd = MLSDdetector.from_pretrained('lllyasviel/ControlNet')
84
-
85
- image = Image.open('images/room.png')
86
-
87
- image = mlsd(image)
88
-
89
- controlnet = ControlNetModel.from_pretrained(
90
- "fusing/stable-diffusion-v1-5-controlnet-mlsd",
91
- )
92
-
93
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
94
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
95
- )
96
- pipe.to('cuda')
97
-
98
- image = pipe("room", image).images[0]
99
-
100
- image.save('images/room_mlsd_out.png')
101
- ```
102
-
103
- ![room](./images/room.png)
104
-
105
- ![room_mlsd](./images/room_mlsd.png)
106
-
107
- ![room_mlsd_out](./images/room_mlsd_out.png)
108
-
109
- ## Pose estimation
110
-
111
- Install the additional controlnet models package.
112
-
113
- ```sh
114
- $ pip install git+https://github.com/patrickvonplaten/human_pose.git
115
- ```
116
-
117
- ```py
118
- from PIL import Image
119
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
120
- import torch
121
- from human_pose import OpenposeDetector
122
-
123
- openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
124
-
125
- image = Image.open('images/pose.png')
126
-
127
- image = openpose(image)
128
-
129
- controlnet = ControlNetModel.from_pretrained(
130
- "fusing/stable-diffusion-v1-5-controlnet-openpose",
131
- )
132
-
133
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
134
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
135
- )
136
- pipe.to('cuda')
137
-
138
- image = pipe("chef in the kitchen", image).images[0]
139
-
140
- image.save('images/chef_pose_out.png')
141
- ```
142
-
143
- ![pose](./images/pose.png)
144
-
145
- ![openpose](./images/openpose.png)
146
-
147
- ![chef_pose_out](./images/chef_pose_out.png)
148
-
149
- ## Semantic Segmentation
150
-
151
- Semantic segmentation relies on transformers. Transformers is a
152
- dependency of diffusers for running controlnet, so you should
153
- have it installed already.
154
-
155
- ```py
156
- from transformers import AutoImageProcessor, UperNetForSemanticSegmentation
157
- from PIL import Image
158
- import numpy as np
159
- from controlnet_utils import ade_palette
160
- import torch
161
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
162
-
163
- image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small")
164
- image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small")
165
-
166
- image = Image.open("./images/house.png").convert('RGB')
167
-
168
- pixel_values = image_processor(image, return_tensors="pt").pixel_values
169
-
170
- with torch.no_grad():
171
- outputs = image_segmentor(pixel_values)
172
-
173
- seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
174
-
175
- color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3
176
-
177
- palette = np.array(ade_palette())
178
-
179
- for label, color in enumerate(palette):
180
- color_seg[seg == label, :] = color
181
-
182
- color_seg = color_seg.astype(np.uint8)
183
-
184
- image = Image.fromarray(color_seg)
185
-
186
- controlnet = ControlNetModel.from_pretrained(
187
- "fusing/stable-diffusion-v1-5-controlnet-seg",
188
- )
189
-
190
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
191
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
192
- )
193
- pipe.to('cuda')
194
-
195
- image = pipe("house", image).images[0]
196
-
197
- image.save('./images/house_seg_out.png')
198
- ```
199
-
200
- ![house](images/house.png)
201
-
202
- ![house_seg](images/house_seg.png)
203
-
204
- ![house_seg_out](images/house_seg_out.png)
205
-
206
- ## Depth control
207
-
208
- Depth control relies on transformers. Transformers is a dependency of diffusers for running controlnet, so
209
- you should have it installed already.
210
-
211
- ```py
212
- from transformers import pipeline
213
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
214
- from PIL import Image
215
- import numpy as np
216
-
217
- depth_estimator = pipeline('depth-estimation')
218
-
219
- image = Image.open('./images/stormtrooper.png')
220
- image = depth_estimator(image)['depth']
221
- image = np.array(image)
222
- image = image[:, :, None]
223
- image = np.concatenate([image, image, image], axis=2)
224
- image = Image.fromarray(image)
225
-
226
- controlnet = ControlNetModel.from_pretrained(
227
- "fusing/stable-diffusion-v1-5-controlnet-depth",
228
- )
229
-
230
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
231
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
232
- )
233
- pipe.to('cuda')
234
-
235
- image = pipe("Stormtrooper's lecture", image).images[0]
236
-
237
- image.save('./images/stormtrooper_depth_out.png')
238
- ```
239
-
240
- ![stormtrooper](./images/stormtrooper.png)
241
-
242
- ![stormtrooler_depth](./images/stormtrooper_depth.png)
243
-
244
- ![stormtrooler_depth_out](./images/stormtrooper_depth_out.png)
245
-
246
-
247
- ## Normal map
248
-
249
- ```py
250
- from PIL import Image
251
- from transformers import pipeline
252
- import numpy as np
253
- import cv2
254
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
255
-
256
- image = Image.open("images/toy.png").convert("RGB")
257
-
258
- depth_estimator = pipeline("depth-estimation", model ="Intel/dpt-hybrid-midas" )
259
-
260
- image = depth_estimator(image)['predicted_depth'][0]
261
-
262
- image = image.numpy()
263
-
264
- image_depth = image.copy()
265
- image_depth -= np.min(image_depth)
266
- image_depth /= np.max(image_depth)
267
-
268
- bg_threhold = 0.4
269
-
270
- x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
271
- x[image_depth < bg_threhold] = 0
272
-
273
- y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3)
274
- y[image_depth < bg_threhold] = 0
275
-
276
- z = np.ones_like(x) * np.pi * 2.0
277
-
278
- image = np.stack([x, y, z], axis=2)
279
- image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
280
- image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
281
- image = Image.fromarray(image)
282
-
283
- controlnet = ControlNetModel.from_pretrained(
284
- "fusing/stable-diffusion-v1-5-controlnet-normal",
285
- )
286
-
287
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
288
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
289
- )
290
- pipe.to('cuda')
291
-
292
- image = pipe("cute toy", image).images[0]
293
-
294
- image.save('images/toy_normal_out.png')
295
- ```
296
-
297
- ![toy](./images/toy.png)
298
-
299
- ![toy_normal](./images/toy_normal.png)
300
-
301
- ![toy_normal_out](./images/toy_normal_out.png)
302
 
303
- ## Scribble
304
 
305
  Install the additional controlnet models package.
306
 
307
  ```sh
308
- $ pip install git+https://github.com/patrickvonplaten/human_pose.git
309
  ```
310
 
311
  ```py
312
  from PIL import Image
313
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
314
  import torch
315
- from human_pose import HEDdetector
316
 
317
  hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
318
 
319
- image = Image.open('images/bag.png')
320
 
321
- image = hed(image, scribble=True)
322
 
323
  controlnet = ControlNetModel.from_pretrained(
324
- "fusing/stable-diffusion-v1-5-controlnet-scribble",
325
  )
326
 
327
  pipe = StableDiffusionControlNetPipeline.from_pretrained(
328
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
329
  )
330
- pipe.to('cuda')
331
-
332
- image = pipe("bag", image).images[0]
333
-
334
- image.save('images/bag_scribble_out.png')
335
- ```
336
-
337
- ![bag](./images/bag.png)
338
-
339
- ![bag_scribble](./images/bag_scribble.png)
340
-
341
- ![bag_scribble_out](./images/bag_scribble_out.png)
342
 
343
- ## HED Boundary
344
-
345
- Install the additional controlnet models package.
346
-
347
- ```sh
348
- $ pip install git+https://github.com/patrickvonplaten/human_pose.git
349
- ```
350
-
351
- ```py
352
- from PIL import Image
353
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
354
- import torch
355
- from human_pose import HEDdetector
356
-
357
- hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
358
 
359
- image = Image.open('images/man.png')
 
 
 
360
 
361
- image = hed(image)
362
 
363
- controlnet = ControlNetModel.from_pretrained(
364
- "fusing/stable-diffusion-v1-5-controlnet-hed",
365
- )
366
-
367
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
368
- "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
369
- )
370
- pipe.to('cuda')
371
-
372
- image = pipe("oil painting of handsome old man, masterpiece", image).images[0]
373
 
374
  image.save('images/man_hed_out.png')
375
  ```
@@ -378,4 +68,8 @@ image.save('images/man_hed_out.png')
378
 
379
  ![man_hed](./images/man_hed.png)
380
 
381
- ![man_hed_out](./images/man_hed_out.png)
 
 
 
 
 
18
  The auxiliary conditioning is passed directly to the diffusers pipeline. If you want to process an image to create the auxiliary conditioning, external dependencies are required.
19
 
20
  Some of the additional conditionings can be extracted from images via additional models. We extracted these
21
+ additional models from the original controlnet repo into a separate package that can be found on [github](https://github.com/patrickvonplaten/controlnet_aux.git).
22
 
23
+ ## HED Boundary
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
+ ### Diffusers
26
 
27
  Install the additional controlnet models package.
28
 
29
  ```sh
30
+ $ pip install git+https://github.com/patrickvonplaten/controlnet_aux.git
31
  ```
32
 
33
  ```py
34
  from PIL import Image
35
+ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
36
  import torch
37
+ from controlnet_aux import HEDdetector
38
 
39
  hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
40
 
41
+ image = Image.open('images/man.png')
42
 
43
+ image = hed(image)
44
 
45
  controlnet = ControlNetModel.from_pretrained(
46
+ "fusing/stable-diffusion-v1-5-controlnet-hed", torch_dtype=torch.float16
47
  )
48
 
49
  pipe = StableDiffusionControlNetPipeline.from_pretrained(
50
+ "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
51
  )
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
+ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ # Remove if you do not have xformers installed
56
+ # see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
57
+ # for installation instructions
58
+ pipe.enable_xformers_memory_efficient_attention()
59
 
60
+ pipe.enable_model_cpu_offload()
61
 
62
+ image = pipe("oil painting of handsome old man, masterpiece", image, num_inference_steps=20).images[0]
 
 
 
 
 
 
 
 
 
63
 
64
  image.save('images/man_hed_out.png')
65
  ```
 
68
 
69
  ![man_hed](./images/man_hed.png)
70
 
71
+ ![man_hed_out](./images/man_hed_out.png)
72
+
73
+ ### Training
74
+
75
+ The HED Edge model was trained on 3M edge-image, caption pairs. The model was trained for 600 GPU-hours with Nvidia A100 80G using Stable Diffusion 1.5 as a base model.
controlnet_utils.py DELETED
@@ -1,40 +0,0 @@
1
- def ade_palette():
2
- """ADE20K palette that maps each class to RGB values."""
3
- return [[120, 120, 120], [180, 120, 120], [6, 230, 230], [80, 50, 50],
4
- [4, 200, 3], [120, 120, 80], [140, 140, 140], [204, 5, 255],
5
- [230, 230, 230], [4, 250, 7], [224, 5, 255], [235, 255, 7],
6
- [150, 5, 61], [120, 120, 70], [8, 255, 51], [255, 6, 82],
7
- [143, 255, 140], [204, 255, 4], [255, 51, 7], [204, 70, 3],
8
- [0, 102, 200], [61, 230, 250], [255, 6, 51], [11, 102, 255],
9
- [255, 7, 71], [255, 9, 224], [9, 7, 230], [220, 220, 220],
10
- [255, 9, 92], [112, 9, 255], [8, 255, 214], [7, 255, 224],
11
- [255, 184, 6], [10, 255, 71], [255, 41, 10], [7, 255, 255],
12
- [224, 255, 8], [102, 8, 255], [255, 61, 6], [255, 194, 7],
13
- [255, 122, 8], [0, 255, 20], [255, 8, 41], [255, 5, 153],
14
- [6, 51, 255], [235, 12, 255], [160, 150, 20], [0, 163, 255],
15
- [140, 140, 140], [250, 10, 15], [20, 255, 0], [31, 255, 0],
16
- [255, 31, 0], [255, 224, 0], [153, 255, 0], [0, 0, 255],
17
- [255, 71, 0], [0, 235, 255], [0, 173, 255], [31, 0, 255],
18
- [11, 200, 200], [255, 82, 0], [0, 255, 245], [0, 61, 255],
19
- [0, 255, 112], [0, 255, 133], [255, 0, 0], [255, 163, 0],
20
- [255, 102, 0], [194, 255, 0], [0, 143, 255], [51, 255, 0],
21
- [0, 82, 255], [0, 255, 41], [0, 255, 173], [10, 0, 255],
22
- [173, 255, 0], [0, 255, 153], [255, 92, 0], [255, 0, 255],
23
- [255, 0, 245], [255, 0, 102], [255, 173, 0], [255, 0, 20],
24
- [255, 184, 184], [0, 31, 255], [0, 255, 61], [0, 71, 255],
25
- [255, 0, 204], [0, 255, 194], [0, 255, 82], [0, 10, 255],
26
- [0, 112, 255], [51, 0, 255], [0, 194, 255], [0, 122, 255],
27
- [0, 255, 163], [255, 153, 0], [0, 255, 10], [255, 112, 0],
28
- [143, 255, 0], [82, 0, 255], [163, 255, 0], [255, 235, 0],
29
- [8, 184, 170], [133, 0, 255], [0, 255, 92], [184, 0, 255],
30
- [255, 0, 31], [0, 184, 255], [0, 214, 255], [255, 0, 112],
31
- [92, 255, 0], [0, 224, 255], [112, 224, 255], [70, 184, 160],
32
- [163, 0, 255], [153, 0, 255], [71, 255, 0], [255, 0, 163],
33
- [255, 204, 0], [255, 0, 143], [0, 255, 235], [133, 255, 0],
34
- [255, 0, 235], [245, 0, 255], [255, 0, 122], [255, 245, 0],
35
- [10, 190, 212], [214, 255, 0], [0, 204, 255], [20, 0, 255],
36
- [255, 255, 0], [0, 153, 255], [0, 41, 255], [0, 255, 204],
37
- [41, 0, 255], [41, 255, 0], [173, 0, 255], [0, 245, 255],
38
- [71, 0, 255], [122, 0, 255], [0, 255, 184], [0, 92, 255],
39
- [184, 255, 0], [0, 133, 255], [255, 214, 0], [25, 194, 194],
40
- [102, 255, 0], [92, 0, 255]]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
images/bag.png DELETED
Binary file (462 kB)
 
images/bag_scribble.png DELETED
Binary file (11 kB)
 
images/bag_scribble_out.png DELETED
Binary file (556 kB)
 
images/bird.png DELETED

Git LFS Details

  • SHA256: cad49fc7d3071b2bcd078bc8dde365f8fa62eaa6d43705fd50c212794a3aac35
  • Pointer size: 132 Bytes
  • Size of remote file: 1.07 MB
images/bird_canny.png DELETED
Binary file (29.1 kB)
 
images/bird_canny_out.png DELETED
Binary file (845 kB)
 
images/chef_pose_out.png DELETED
Binary file (570 kB)
 
images/house.png DELETED
Binary file (391 kB)
 
images/house_seg.png DELETED
Binary file (3.68 kB)
 
images/house_seg_out.png DELETED
Binary file (472 kB)
 
images/man_hed_out.png CHANGED
images/openpose.png DELETED
Binary file (6.55 kB)
 
images/pose.png DELETED
Binary file (592 kB)
 
images/room.png DELETED
Binary file (637 kB)
 
images/room_mlsd.png DELETED
Binary file (9.06 kB)
 
images/room_mlsd_out.png DELETED
Binary file (575 kB)
 
images/stormtrooper.png DELETED
Binary file (218 kB)
 
images/stormtrooper_depth.png DELETED
Binary file (54.1 kB)
 
images/stormtrooper_depth_out.png DELETED
Binary file (343 kB)
 
images/toy.png DELETED
Binary file (312 kB)
 
images/toy_normal.png DELETED
Binary file (90.1 kB)
 
images/toy_normal_out.png DELETED
Binary file (231 kB)