init commit
Browse files- .gitattributes +2 -0
- README.md +148 -1
- feature_extractor/preprocessor_config.json +28 -0
- model.ckpt +3 -0
- model_index.json +33 -0
- safety_checker/config.json +181 -0
- safety_checker/pytorch_model.bin +3 -0
- scheduler/scheduler_config.json +13 -0
- text_encoder/config.json +34 -0
- text_encoder/pytorch_model.bin +3 -0
- tokenizer/special_tokens_map.json +7 -0
- tokenizer/tokenizer_config.json +16 -0
- tokenizer/vocab.txt +0 -0
- unet/config.json +45 -0
- unet/diffusion_pytorch_model.bin +3 -0
- vae/config.json +30 -0
- vae/diffusion_pytorch_model.bin +3 -0
.gitattributes
CHANGED
@@ -32,3 +32,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,150 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: zh
|
3 |
+
license: creativeml-openrail-m
|
4 |
+
|
5 |
+
tags:
|
6 |
+
- stable-diffusion
|
7 |
+
- stable-diffusion-diffusers
|
8 |
+
- text-to-image
|
9 |
+
- zh
|
10 |
+
- Chinese
|
11 |
+
- Anime
|
12 |
+
|
13 |
+
inference: true
|
14 |
+
widget:
|
15 |
+
- text: "1个女孩,美丽,可爱"
|
16 |
+
example_title: 1个女孩
|
17 |
+
- text: "1个男孩,帅气脸"
|
18 |
+
example_title: 1个男孩
|
19 |
+
|
20 |
+
|
21 |
+
extra_gated_prompt: |-
|
22 |
+
One more step before getting this model.
|
23 |
+
This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
|
24 |
+
The CreativeML OpenRAIL License specifies:
|
25 |
+
|
26 |
+
1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
|
27 |
+
2. IDEA-CCNL claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
|
28 |
+
3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
|
29 |
+
Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
|
30 |
+
|
31 |
+
By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
|
32 |
+
extra_gated_fields:
|
33 |
+
I have read the License and agree with its terms: checkbox
|
34 |
---
|
35 |
+
|
36 |
+
# Taiyi-Stable-Diffusion-1B-Chinese-v0.1
|
37 |
+
|
38 |
+
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
|
39 |
+
- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)
|
40 |
+
- API:[Fengshen-OpenAPI](https://fengshenbang-lm.com/open-api)
|
41 |
+
|
42 |
+
## 简介 Brief Introduction
|
43 |
+
|
44 |
+
首个开源的中文Stable Diffusion动漫模型,基于100万筛选过的动漫中文图文对训练。
|
45 |
+
|
46 |
+
The first open source Chinese Stable diffusion Anime model, which was trained on 100w filtered Anime Chinese image-text pairs.
|
47 |
+
|
48 |
+
## 模型分类 Model Taxonomy
|
49 |
+
|
50 |
+
| 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
|
51 |
+
| :----: | :----: | :----: | :----: | :----: | :----: |
|
52 |
+
| 特殊 Special | 多模态 Multimodal | 太乙 Taiyi | Stable Diffusion | 1B | Chinese |
|
53 |
+
|
54 |
+
## 模型信息 Model Information
|
55 |
+
|
56 |
+
我们将[Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/)数据集(100M)和[Zero](https://zero.so.com/)数据集(23M)用作预训练的数据集,先用[IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese)对这两个数据集的图文对相似性进行打分,取CLIP Score大于0.2的图文对作为我们的训练集。 我们使用[IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese)作为初始化的text encoder,冻住[stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)([论文](https://arxiv.org/abs/2112.10752))模型的其他部分,只训练text encoder,以便保留原始模型的生成能力且实现中文概念的对齐。该模型目前在0.2亿图文对上训练了一个epoch。 我们在 32 x A100 训练了大约100小时。该版本只是一个初步的版本,我们将持续优化并开源后续模型,欢迎交流。
|
57 |
+
|
58 |
+
We use [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/)(100M) 和 [Zero](https://zero.so.com/)(23M) as our dataset, and take the image and text pairs with CLIP Score (based on [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese)) greater than 0.2 as our Training set. We use [IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese](https://huggingface.co/IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) as our init text encoder. To keep the powerful generative capability of stable diffusion and align Chinese concepts with the images, We only train the text encoder and freeze other part of the [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)([paper](https://arxiv.org/abs/2112.10752)) model. It takes 100 hours to train this model based on 32 x A100. This model is a preliminary version and we will update this model continuously and open sourse. Welcome to exchange!
|
59 |
+
|
60 |
+
### Result
|
61 |
+
Basic Prompt
|
62 |
+
|
63 |
+
| 铁马冰河入梦来,3D绘画。 | 飞流直下三千尺,油画。 | 女孩背影,日落,唯美插画。 |
|
64 |
+
| ---- | ---- | ---- |
|
65 |
+
| ![](result_examples/tiema.png) | ![](result_examples/feiliu.png) | ![](result_examples/nvhai.jpg) |
|
66 |
+
|
67 |
+
Advanced Prompt
|
68 |
+
|
69 |
+
| 铁马冰河入梦来,概念画,科幻,玄幻,3D | 中国海边城市,科幻,未来感,唯美,插画。 | 那人却在灯火阑珊处,色彩艳丽,古风,资深插画师作品,桌面高清壁纸。 |
|
70 |
+
| ---- | ---- | ---- |
|
71 |
+
| ![](result_examples/tiema2.jpg) | ![](result_examples/chengshi.jpg) | ![](result_examples/naren.jpg) |
|
72 |
+
|
73 |
+
|
74 |
+
## ��用 Usage
|
75 |
+
|
76 |
+
### 全精度 Full precision
|
77 |
+
|
78 |
+
```py
|
79 |
+
from diffusers import StableDiffusionPipeline
|
80 |
+
|
81 |
+
pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Anime-Chinese-v0.1").to("cuda")
|
82 |
+
|
83 |
+
prompt = '1个女孩,美丽,可爱'
|
84 |
+
image = pipe(prompt, guidance_scale=7.5).images[0]
|
85 |
+
image.save("1个女孩.png")
|
86 |
+
```
|
87 |
+
|
88 |
+
### 半精度 Half precision FP16 (CUDA)
|
89 |
+
|
90 |
+
添加 `torch_dtype=torch.float16` 和 `device_map="auto"` 可以快速加载 FP16 的权重,以加快推理速度。
|
91 |
+
更多信息见 [the optimization docs](https://huggingface.co/docs/diffusers/main/en/optimization/fp16#half-precision-weights)。
|
92 |
+
|
93 |
+
```py
|
94 |
+
# !pip install git+https://github.com/huggingface/accelerate
|
95 |
+
import torch
|
96 |
+
from diffusers import StableDiffusionPipeline
|
97 |
+
torch.backends.cudnn.benchmark = True
|
98 |
+
pipe = StableDiffusionPipeline.from_pretrained("IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Anime-Chinese-v0.1", torch_dtype=torch.float16)
|
99 |
+
pipe.to('cuda')
|
100 |
+
|
101 |
+
prompt = '1个女孩,美丽,可爱'
|
102 |
+
image = pipe(prompt, guidance_scale=7.5).images[0]
|
103 |
+
image.save("1个女孩.png")
|
104 |
+
```
|
105 |
+
|
106 |
+
### 使用手册 Handbook for Taiyi
|
107 |
+
|
108 |
+
https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/examples/stable_diffusion_chinese/taiyi_handbook.md
|
109 |
+
|
110 |
+
### 怎样微调 How to finetune
|
111 |
+
|
112 |
+
https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/finetune_taiyi_stable_diffusion
|
113 |
+
|
114 |
+
### webui配置 Configure webui
|
115 |
+
|
116 |
+
https://github.com/IDEA-CCNL/stable-diffusion-webui/blob/master/README.md
|
117 |
+
|
118 |
+
### DreamBooth
|
119 |
+
|
120 |
+
https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/stable_diffusion_dreambooth
|
121 |
+
|
122 |
+
## 引用 Citation
|
123 |
+
|
124 |
+
如果您在您的工作中使用了我们的模型,可以引用我们的[总论文](https://arxiv.org/abs/2209.02970):
|
125 |
+
|
126 |
+
If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2209.02970):
|
127 |
+
|
128 |
+
```text
|
129 |
+
@article{fengshenbang,
|
130 |
+
author = {Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen and Ruyi Gan and Jiaxing Zhang},
|
131 |
+
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
|
132 |
+
journal = {CoRR},
|
133 |
+
volume = {abs/2209.02970},
|
134 |
+
year = {2022}
|
135 |
+
}
|
136 |
+
```
|
137 |
+
|
138 |
+
也可以引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
|
139 |
+
|
140 |
+
You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
|
141 |
+
|
142 |
+
```text
|
143 |
+
@misc{Fengshenbang-LM,
|
144 |
+
title={Fengshenbang-LM},
|
145 |
+
author={IDEA-CCNL},
|
146 |
+
year={2021},
|
147 |
+
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
|
148 |
+
}
|
149 |
+
```
|
150 |
+
|
feature_extractor/preprocessor_config.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"crop_size": {
|
3 |
+
"height": 224,
|
4 |
+
"width": 224
|
5 |
+
},
|
6 |
+
"do_center_crop": true,
|
7 |
+
"do_convert_rgb": true,
|
8 |
+
"do_normalize": true,
|
9 |
+
"do_rescale": true,
|
10 |
+
"do_resize": true,
|
11 |
+
"feature_extractor_type": "CLIPFeatureExtractor",
|
12 |
+
"image_mean": [
|
13 |
+
0.48145466,
|
14 |
+
0.4578275,
|
15 |
+
0.40821073
|
16 |
+
],
|
17 |
+
"image_processor_type": "CLIPImageProcessor",
|
18 |
+
"image_std": [
|
19 |
+
0.26862954,
|
20 |
+
0.26130258,
|
21 |
+
0.27577711
|
22 |
+
],
|
23 |
+
"resample": 3,
|
24 |
+
"rescale_factor": 0.00392156862745098,
|
25 |
+
"size": {
|
26 |
+
"shortest_edge": 224
|
27 |
+
}
|
28 |
+
}
|
model.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:70a38ac1c4eb9d0e1ecbdf99f35d44a0cce9a7cbef04ef2360c3f3d85c12a299
|
3 |
+
size 4181988676
|
model_index.json
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "StableDiffusionPipeline",
|
3 |
+
"_diffusers_version": "0.11.1",
|
4 |
+
"feature_extractor": [
|
5 |
+
"transformers",
|
6 |
+
"CLIPImageProcessor"
|
7 |
+
],
|
8 |
+
"requires_safety_checker": true,
|
9 |
+
"safety_checker": [
|
10 |
+
"stable_diffusion",
|
11 |
+
"StableDiffusionSafetyChecker"
|
12 |
+
],
|
13 |
+
"scheduler": [
|
14 |
+
"diffusers",
|
15 |
+
"PNDMScheduler"
|
16 |
+
],
|
17 |
+
"text_encoder": [
|
18 |
+
"transformers",
|
19 |
+
"BertModel"
|
20 |
+
],
|
21 |
+
"tokenizer": [
|
22 |
+
"transformers",
|
23 |
+
"BertTokenizer"
|
24 |
+
],
|
25 |
+
"unet": [
|
26 |
+
"diffusers",
|
27 |
+
"UNet2DConditionModel"
|
28 |
+
],
|
29 |
+
"vae": [
|
30 |
+
"diffusers",
|
31 |
+
"AutoencoderKL"
|
32 |
+
]
|
33 |
+
}
|
safety_checker/config.json
ADDED
@@ -0,0 +1,181 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_commit_hash": null,
|
3 |
+
"_name_or_path": "/cognitive_comp/wuxiaojun/pretrained/pytorch/Taiyi-Stable-Diffusion-1B-Chinese-v0.1/safety_checker",
|
4 |
+
"architectures": [
|
5 |
+
"StableDiffusionSafetyChecker"
|
6 |
+
],
|
7 |
+
"initializer_factor": 1.0,
|
8 |
+
"logit_scale_init_value": 2.6592,
|
9 |
+
"model_type": "clip",
|
10 |
+
"projection_dim": 768,
|
11 |
+
"text_config": {
|
12 |
+
"_name_or_path": "",
|
13 |
+
"add_cross_attention": false,
|
14 |
+
"architectures": null,
|
15 |
+
"attention_dropout": 0.0,
|
16 |
+
"bad_words_ids": null,
|
17 |
+
"begin_suppress_tokens": null,
|
18 |
+
"bos_token_id": 0,
|
19 |
+
"chunk_size_feed_forward": 0,
|
20 |
+
"cross_attention_hidden_size": null,
|
21 |
+
"decoder_start_token_id": null,
|
22 |
+
"diversity_penalty": 0.0,
|
23 |
+
"do_sample": false,
|
24 |
+
"dropout": 0.0,
|
25 |
+
"early_stopping": false,
|
26 |
+
"encoder_no_repeat_ngram_size": 0,
|
27 |
+
"eos_token_id": 2,
|
28 |
+
"exponential_decay_length_penalty": null,
|
29 |
+
"finetuning_task": null,
|
30 |
+
"forced_bos_token_id": null,
|
31 |
+
"forced_eos_token_id": null,
|
32 |
+
"hidden_act": "quick_gelu",
|
33 |
+
"hidden_size": 768,
|
34 |
+
"id2label": {
|
35 |
+
"0": "LABEL_0",
|
36 |
+
"1": "LABEL_1"
|
37 |
+
},
|
38 |
+
"initializer_factor": 1.0,
|
39 |
+
"initializer_range": 0.02,
|
40 |
+
"intermediate_size": 3072,
|
41 |
+
"is_decoder": false,
|
42 |
+
"is_encoder_decoder": false,
|
43 |
+
"label2id": {
|
44 |
+
"LABEL_0": 0,
|
45 |
+
"LABEL_1": 1
|
46 |
+
},
|
47 |
+
"layer_norm_eps": 1e-05,
|
48 |
+
"length_penalty": 1.0,
|
49 |
+
"max_length": 20,
|
50 |
+
"max_position_embeddings": 77,
|
51 |
+
"min_length": 0,
|
52 |
+
"model_type": "clip_text_model",
|
53 |
+
"no_repeat_ngram_size": 0,
|
54 |
+
"num_attention_heads": 12,
|
55 |
+
"num_beam_groups": 1,
|
56 |
+
"num_beams": 1,
|
57 |
+
"num_hidden_layers": 12,
|
58 |
+
"num_return_sequences": 1,
|
59 |
+
"output_attentions": false,
|
60 |
+
"output_hidden_states": false,
|
61 |
+
"output_scores": false,
|
62 |
+
"pad_token_id": 1,
|
63 |
+
"prefix": null,
|
64 |
+
"problem_type": null,
|
65 |
+
"projection_dim": 512,
|
66 |
+
"pruned_heads": {},
|
67 |
+
"remove_invalid_values": false,
|
68 |
+
"repetition_penalty": 1.0,
|
69 |
+
"return_dict": true,
|
70 |
+
"return_dict_in_generate": false,
|
71 |
+
"sep_token_id": null,
|
72 |
+
"suppress_tokens": null,
|
73 |
+
"task_specific_params": null,
|
74 |
+
"temperature": 1.0,
|
75 |
+
"tf_legacy_loss": false,
|
76 |
+
"tie_encoder_decoder": false,
|
77 |
+
"tie_word_embeddings": true,
|
78 |
+
"tokenizer_class": null,
|
79 |
+
"top_k": 50,
|
80 |
+
"top_p": 1.0,
|
81 |
+
"torch_dtype": null,
|
82 |
+
"torchscript": false,
|
83 |
+
"transformers_version": "4.25.1",
|
84 |
+
"typical_p": 1.0,
|
85 |
+
"use_bfloat16": false,
|
86 |
+
"vocab_size": 49408
|
87 |
+
},
|
88 |
+
"text_config_dict": {
|
89 |
+
"hidden_size": 768,
|
90 |
+
"intermediate_size": 3072,
|
91 |
+
"num_attention_heads": 12,
|
92 |
+
"num_hidden_layers": 12
|
93 |
+
},
|
94 |
+
"torch_dtype": "float32",
|
95 |
+
"transformers_version": null,
|
96 |
+
"vision_config": {
|
97 |
+
"_name_or_path": "",
|
98 |
+
"add_cross_attention": false,
|
99 |
+
"architectures": null,
|
100 |
+
"attention_dropout": 0.0,
|
101 |
+
"bad_words_ids": null,
|
102 |
+
"begin_suppress_tokens": null,
|
103 |
+
"bos_token_id": null,
|
104 |
+
"chunk_size_feed_forward": 0,
|
105 |
+
"cross_attention_hidden_size": null,
|
106 |
+
"decoder_start_token_id": null,
|
107 |
+
"diversity_penalty": 0.0,
|
108 |
+
"do_sample": false,
|
109 |
+
"dropout": 0.0,
|
110 |
+
"early_stopping": false,
|
111 |
+
"encoder_no_repeat_ngram_size": 0,
|
112 |
+
"eos_token_id": null,
|
113 |
+
"exponential_decay_length_penalty": null,
|
114 |
+
"finetuning_task": null,
|
115 |
+
"forced_bos_token_id": null,
|
116 |
+
"forced_eos_token_id": null,
|
117 |
+
"hidden_act": "quick_gelu",
|
118 |
+
"hidden_size": 1024,
|
119 |
+
"id2label": {
|
120 |
+
"0": "LABEL_0",
|
121 |
+
"1": "LABEL_1"
|
122 |
+
},
|
123 |
+
"image_size": 224,
|
124 |
+
"initializer_factor": 1.0,
|
125 |
+
"initializer_range": 0.02,
|
126 |
+
"intermediate_size": 4096,
|
127 |
+
"is_decoder": false,
|
128 |
+
"is_encoder_decoder": false,
|
129 |
+
"label2id": {
|
130 |
+
"LABEL_0": 0,
|
131 |
+
"LABEL_1": 1
|
132 |
+
},
|
133 |
+
"layer_norm_eps": 1e-05,
|
134 |
+
"length_penalty": 1.0,
|
135 |
+
"max_length": 20,
|
136 |
+
"min_length": 0,
|
137 |
+
"model_type": "clip_vision_model",
|
138 |
+
"no_repeat_ngram_size": 0,
|
139 |
+
"num_attention_heads": 16,
|
140 |
+
"num_beam_groups": 1,
|
141 |
+
"num_beams": 1,
|
142 |
+
"num_channels": 3,
|
143 |
+
"num_hidden_layers": 24,
|
144 |
+
"num_return_sequences": 1,
|
145 |
+
"output_attentions": false,
|
146 |
+
"output_hidden_states": false,
|
147 |
+
"output_scores": false,
|
148 |
+
"pad_token_id": null,
|
149 |
+
"patch_size": 14,
|
150 |
+
"prefix": null,
|
151 |
+
"problem_type": null,
|
152 |
+
"projection_dim": 512,
|
153 |
+
"pruned_heads": {},
|
154 |
+
"remove_invalid_values": false,
|
155 |
+
"repetition_penalty": 1.0,
|
156 |
+
"return_dict": true,
|
157 |
+
"return_dict_in_generate": false,
|
158 |
+
"sep_token_id": null,
|
159 |
+
"suppress_tokens": null,
|
160 |
+
"task_specific_params": null,
|
161 |
+
"temperature": 1.0,
|
162 |
+
"tf_legacy_loss": false,
|
163 |
+
"tie_encoder_decoder": false,
|
164 |
+
"tie_word_embeddings": true,
|
165 |
+
"tokenizer_class": null,
|
166 |
+
"top_k": 50,
|
167 |
+
"top_p": 1.0,
|
168 |
+
"torch_dtype": null,
|
169 |
+
"torchscript": false,
|
170 |
+
"transformers_version": "4.25.1",
|
171 |
+
"typical_p": 1.0,
|
172 |
+
"use_bfloat16": false
|
173 |
+
},
|
174 |
+
"vision_config_dict": {
|
175 |
+
"hidden_size": 1024,
|
176 |
+
"intermediate_size": 4096,
|
177 |
+
"num_attention_heads": 16,
|
178 |
+
"num_hidden_layers": 24,
|
179 |
+
"patch_size": 14
|
180 |
+
}
|
181 |
+
}
|
safety_checker/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:16d28f2b37109f222cdc33620fdd262102ac32112be0352a7f77e9614b35a394
|
3 |
+
size 1216064769
|
scheduler/scheduler_config.json
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "PNDMScheduler",
|
3 |
+
"_diffusers_version": "0.11.1",
|
4 |
+
"beta_end": 0.012,
|
5 |
+
"beta_schedule": "scaled_linear",
|
6 |
+
"beta_start": 0.00085,
|
7 |
+
"num_train_timesteps": 1000,
|
8 |
+
"prediction_type": "epsilon",
|
9 |
+
"set_alpha_to_one": false,
|
10 |
+
"skip_prk_steps": true,
|
11 |
+
"steps_offset": 1,
|
12 |
+
"trained_betas": null
|
13 |
+
}
|
text_encoder/config.json
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/cognitive_comp/wuxiaojun/pretrained/pytorch/Taiyi-Stable-Diffusion-1B-Chinese-v0.1",
|
3 |
+
"architectures": [
|
4 |
+
"BertModel"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"bos_token_id": 0,
|
8 |
+
"classifier_dropout": null,
|
9 |
+
"directionality": "bidi",
|
10 |
+
"eos_token_id": 2,
|
11 |
+
"hidden_act": "gelu",
|
12 |
+
"hidden_dropout_prob": 0.1,
|
13 |
+
"hidden_size": 768,
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 3072,
|
16 |
+
"layer_norm_eps": 1e-12,
|
17 |
+
"max_position_embeddings": 512,
|
18 |
+
"model_type": "bert",
|
19 |
+
"num_attention_heads": 12,
|
20 |
+
"num_hidden_layers": 12,
|
21 |
+
"output_past": true,
|
22 |
+
"pad_token_id": 0,
|
23 |
+
"pooler_fc_size": 768,
|
24 |
+
"pooler_num_attention_heads": 12,
|
25 |
+
"pooler_num_fc_layers": 3,
|
26 |
+
"pooler_size_per_head": 128,
|
27 |
+
"pooler_type": "first_token_transform",
|
28 |
+
"position_embedding_type": "absolute",
|
29 |
+
"torch_dtype": "bfloat16",
|
30 |
+
"transformers_version": "4.25.1",
|
31 |
+
"type_vocab_size": 2,
|
32 |
+
"use_cache": true,
|
33 |
+
"vocab_size": 21128
|
34 |
+
}
|
text_encoder/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0832dbed1bcd2cdfd87f031a2b9892f7f566f5b45283b863cf8f1f019371090d
|
3 |
+
size 1923608577
|
tokenizer/special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer/tokenizer_config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"do_basic_tokenize": true,
|
4 |
+
"do_lower_case": true,
|
5 |
+
"mask_token": "[MASK]",
|
6 |
+
"model_max_length": 512,
|
7 |
+
"name_or_path": "/cognitive_comp/wuxiaojun/pretrained/pytorch/Taiyi-Stable-Diffusion-1B-Chinese-v0.1",
|
8 |
+
"never_split": null,
|
9 |
+
"pad_token": "[PAD]",
|
10 |
+
"sep_token": "[SEP]",
|
11 |
+
"special_tokens_map_file": "/home/chenweifeng/.cache/huggingface/hub/models--hfl--chinese-roberta-wwm-ext/snapshots/5c58d0b8ec1d9014354d691c538661bf00bfdb44/special_tokens_map.json",
|
12 |
+
"strip_accents": null,
|
13 |
+
"tokenize_chinese_chars": true,
|
14 |
+
"tokenizer_class": "BertTokenizer",
|
15 |
+
"unk_token": "[UNK]"
|
16 |
+
}
|
tokenizer/vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
unet/config.json
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "UNet2DConditionModel",
|
3 |
+
"_diffusers_version": "0.11.1",
|
4 |
+
"_name_or_path": "/cognitive_comp/wuxiaojun/pretrained/pytorch/Taiyi-Stable-Diffusion-1B-Chinese-v0.1",
|
5 |
+
"act_fn": "silu",
|
6 |
+
"attention_head_dim": 8,
|
7 |
+
"block_out_channels": [
|
8 |
+
320,
|
9 |
+
640,
|
10 |
+
1280,
|
11 |
+
1280
|
12 |
+
],
|
13 |
+
"center_input_sample": false,
|
14 |
+
"class_embed_type": null,
|
15 |
+
"cross_attention_dim": 768,
|
16 |
+
"down_block_types": [
|
17 |
+
"CrossAttnDownBlock2D",
|
18 |
+
"CrossAttnDownBlock2D",
|
19 |
+
"CrossAttnDownBlock2D",
|
20 |
+
"DownBlock2D"
|
21 |
+
],
|
22 |
+
"downsample_padding": 1,
|
23 |
+
"dual_cross_attention": false,
|
24 |
+
"flip_sin_to_cos": true,
|
25 |
+
"freq_shift": 0,
|
26 |
+
"in_channels": 4,
|
27 |
+
"layers_per_block": 2,
|
28 |
+
"mid_block_scale_factor": 1,
|
29 |
+
"mid_block_type": "UNetMidBlock2DCrossAttn",
|
30 |
+
"norm_eps": 1e-05,
|
31 |
+
"norm_num_groups": 32,
|
32 |
+
"num_class_embeds": null,
|
33 |
+
"only_cross_attention": false,
|
34 |
+
"out_channels": 4,
|
35 |
+
"resnet_time_scale_shift": "default",
|
36 |
+
"sample_size": 64,
|
37 |
+
"up_block_types": [
|
38 |
+
"UpBlock2D",
|
39 |
+
"CrossAttnUpBlock2D",
|
40 |
+
"CrossAttnUpBlock2D",
|
41 |
+
"CrossAttnUpBlock2D"
|
42 |
+
],
|
43 |
+
"upcast_attention": false,
|
44 |
+
"use_linear_projection": false
|
45 |
+
}
|
unet/diffusion_pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d237b4e56b25d1f237ca84660c850ecdd72c344a471ead11c3a54b0f7ffe16c8
|
3 |
+
size 1923724007
|
vae/config.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "AutoencoderKL",
|
3 |
+
"_diffusers_version": "0.11.1",
|
4 |
+
"_name_or_path": "/cognitive_comp/wuxiaojun/pretrained/pytorch/Taiyi-Stable-Diffusion-1B-Chinese-v0.1/vae",
|
5 |
+
"act_fn": "silu",
|
6 |
+
"block_out_channels": [
|
7 |
+
128,
|
8 |
+
256,
|
9 |
+
512,
|
10 |
+
512
|
11 |
+
],
|
12 |
+
"down_block_types": [
|
13 |
+
"DownEncoderBlock2D",
|
14 |
+
"DownEncoderBlock2D",
|
15 |
+
"DownEncoderBlock2D",
|
16 |
+
"DownEncoderBlock2D"
|
17 |
+
],
|
18 |
+
"in_channels": 3,
|
19 |
+
"latent_channels": 4,
|
20 |
+
"layers_per_block": 2,
|
21 |
+
"norm_num_groups": 32,
|
22 |
+
"out_channels": 3,
|
23 |
+
"sample_size": 512,
|
24 |
+
"up_block_types": [
|
25 |
+
"UpDecoderBlock2D",
|
26 |
+
"UpDecoderBlock2D",
|
27 |
+
"UpDecoderBlock2D",
|
28 |
+
"UpDecoderBlock2D"
|
29 |
+
]
|
30 |
+
}
|
vae/diffusion_pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af27ea858349760ebe3311953e0bfe8d6fd257dc9537ae0b2b938c262132a2c6
|
3 |
+
size 334711857
|