File size: 3,450 Bytes
ef4d689
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Textual inversion

[[open-in-colab]]

[`StableDiffusionPipeline`]์€  textual-inversion์„ ์ง€์›ํ•˜๋Š”๋ฐ, ์ด๋Š” ๋ช‡ ๊ฐœ์˜ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ stable diffusion๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์ด ์ƒˆ๋กœ์šด ์ปจ์…‰์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•˜๊ณ  ํŠน์ • ์ปจ์…‰์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๋งŒ๋“ค์–ด์ง„ ์ปจ์…‰๋“ค์˜ ์ปฌ๋ ‰์…˜์€ [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer)๋ฅผ ํ†ตํ•ด ๋น ๋ฅด๊ฒŒ ์‚ฌ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Stable Diffusion Conceptualizer์—์„œ ์‚ฌ์ „ํ•™์Šตํ•œ ์ปจ์…‰์„ ์‚ฌ์šฉํ•˜์—ฌ textual-inversion์œผ๋กœ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. textual-inversion์œผ๋กœ ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ์ปจ์…‰์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ์œผ์‹œ๋‹ค๋ฉด,  [Textual Inversion](./training/text_inversion)  ํ›ˆ๋ จ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Hugging Face ๊ณ„์ •์œผ๋กœ ๋กœ๊ทธ์ธํ•˜์„ธ์š”:

```py
from huggingface_hub import notebook_login

notebook_login()
```

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜ `image_grid`๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

```py
import os
import torch

import PIL
from PIL import Image

from diffusers import StableDiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer


def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid
```

Stable Diffusion๊ณผ [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer)์—์„œ ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

```py
pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
repo_id_embeds = "sd-concepts-library/cat-toy"
```

์ด์ œ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•˜๊ณ  ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์„ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

```py
pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda")

pipeline.load_textual_inversion(repo_id_embeds)
```

ํŠน๋ณ„ํ•œ placeholder token '`<cat-toy>`'๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์œผ๋กœ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋งŒ๋“ค๊ณ , ์ƒ์„ฑํ•  ์ƒ˜ํ”Œ์˜ ์ˆ˜์™€ ์ด๋ฏธ์ง€ ํ–‰์˜ ์ˆ˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

```py
prompt = "a grafitti in a favela wall with a <cat-toy> on it"

num_samples = 2
num_rows = 2
```

๊ทธ๋Ÿฐ ๋‹ค์Œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๊ณ , ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฒ˜์Œ์— ๋งŒ๋“ค์—ˆ๋˜ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜ `image_grid`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ ๊ฒฐ๊ณผ๋“ค์„ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ `num_inference_steps`์™€ `guidance_scale`๊ณผ ๊ฐ™์€ ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ์กฐ์ •ํ•˜์—ฌ, ์ด๊ฒƒ๋“ค์ด ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์— ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ ์ž์œ ๋กญ๊ฒŒ ํ™•์ธํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

```py
all_images = []
for _ in range(num_rows):
    images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images
    all_images.extend(images)

grid = image_grid(all_images, num_samples, num_rows)
grid
```

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png">
</div>