|
<!--Copyright 2024 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
<p align="center"> |
|
<br> |
|
<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/> |
|
<br> |
|
</p> |
|
|
|
|
|
# Diffusers |
|
|
|
π€ Diffusersλ μ΄λ―Έμ§, μ€λμ€, μ¬μ§μ΄ λΆμμ 3D ꡬ쑰λ₯Ό μμ±νκΈ° μν μ΅μ²¨λ¨ μ¬μ νλ ¨λ diffusion λͺ¨λΈμ μν λΌμ΄λΈλ¬λ¦¬μ
λλ€. κ°λ¨ν μΆλ‘ μ루μ
μ μ°Ύκ³ μλ , μ체 diffusion λͺ¨λΈμ νλ ¨νκ³ μΆλ , π€ Diffusersλ λ κ°μ§ λͺ¨λλ₯Ό μ§μνλ λͺ¨λμ ν΄λ°μ€μ
λλ€. μ ν¬ λΌμ΄λΈλ¬λ¦¬λ [μ±λ₯λ³΄λ€ μ¬μ©μ±](conceptual/philosophy#usability-over-performance), [κ°νΈν¨λ³΄λ€ λ¨μν¨](conceptual/philosophy#simple-over-easy), κ·Έλ¦¬κ³ [μΆμνλ³΄λ€ μ¬μ©μ μ§μ κ°λ₯μ±](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction)μ μ€μ μ λκ³ μ€κ³λμμ΅λλ€. |
|
|
|
μ΄ λΌμ΄λΈλ¬λ¦¬μλ μΈ κ°μ§ μ£Όμ κ΅¬μ± μμκ° μμ΅λλ€: |
|
|
|
- λͺ μ€μ μ½λλ§μΌλ‘ μΆλ‘ ν μ μλ μ΅μ²¨λ¨ [diffusion νμ΄νλΌμΈ](api/pipelines/overview). |
|
- μμ± μλμ νμ§ κ°μ κ· νμ λ§μΆκΈ° μν΄ μνΈκ΅νμ μΌλ‘ μ¬μ©ν μ μλ [λ
Έμ΄μ¦ μ€μΌμ€λ¬](api/schedulers/overview). |
|
- λΉλ© λΈλ‘μΌλ‘ μ¬μ©ν μ μκ³ μ€μΌμ€λ¬μ κ²°ν©νμ¬ μ체μ μΈ end-to-end diffusion μμ€ν
μ λ§λ€ μ μλ μ¬μ νμ΅λ [λͺ¨λΈ](api/models). |
|
|
|
<div class="mt-10"> |
|
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5"> |
|
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/tutorial_overview" |
|
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div> |
|
<p class="text-gray-700">κ²°κ³Όλ¬Όμ μμ±νκ³ , λλ§μ diffusion μμ€ν
μ ꡬμΆνκ³ , νμ° λͺ¨λΈμ νλ ¨νλ λ° νμν κΈ°λ³Έ κΈ°μ μ λ°°μ보μΈμ. π€ Diffusersλ₯Ό μ²μ μ¬μ©νλ κ²½μ° μ¬κΈ°μμ μμνλ κ²μ΄ μ’μ΅λλ€!</p> |
|
</a> |
|
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./using-diffusers/loading_overview" |
|
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div> |
|
<p class="text-gray-700">νμ΄νλΌμΈ, λͺ¨λΈ, μ€μΌμ€λ¬λ₯Ό λ‘λνλ λ° λμμ΄ λλ μ€μ©μ μΈ κ°μ΄λμ
λλ€. λν νΉμ μμ
μ νμ΄νλΌμΈμ μ¬μ©νκ³ , μΆλ ₯ μμ± λ°©μμ μ μ΄νκ³ , μΆλ‘ μλμ λ§κ² μ΅μ ννκ³ , λ€μν νμ΅ κΈ°λ²μ μ¬μ©νλ λ°©λ²λ λ°°μΈ μ μμ΅λλ€.</p> |
|
</a> |
|
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual/philosophy" |
|
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div> |
|
<p class="text-gray-700">λΌμ΄λΈλ¬λ¦¬κ° μ μ΄λ° λ°©μμΌλ‘ μ€κ³λμλμ§ μ΄ν΄νκ³ , λΌμ΄λΈλ¬λ¦¬ μ΄μ©μ λν μ€λ¦¬μ κ°μ΄λλΌμΈκ³Ό μμ ꡬνμ λν΄ μμΈν μμ보μΈμ.</p> |
|
</a> |
|
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./api/models" |
|
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div> |
|
<p class="text-gray-700">π€ Diffusers ν΄λμ€ λ° λ©μλμ μλ λ°©μμ λν κΈ°μ μ€λͺ
.</p> |
|
</a> |
|
</div> |
|
</div> |
|
|
|
## Supported pipelines |
|
|
|
| Pipeline | Paper/Repository | Tasks | |
|
|---|---|:---:| |
|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | |
|
| [audio_diffusion](./api/pipelines/audio_diffusion) | [Audio Diffusion](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | |
|
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | |
|
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | |
|
| [dance_diffusion](./api/pipelines/dance_diffusion) | [Dance Diffusion](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | |
|
| [ddpm](./api/pipelines/ddpm) | [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | |
|
| [ddim](./api/pipelines/ddim) | [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | |
|
| [if](./if) | [**IF**](./api/pipelines/if) | Image Generation | |
|
| [if_img2img](./if) | [**IF**](./api/pipelines/if) | Image-to-Image Generation | |
|
| [if_inpainting](./if) | [**IF**](./api/pipelines/if) | Image-to-Image Generation | |
|
| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | |
|
| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | |
|
| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | |
|
| [paint_by_example](./api/pipelines/paint_by_example) | [Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | |
|
| [pndm](./api/pipelines/pndm) | [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | |
|
| [score_sde_ve](./api/pipelines/score_sde_ve) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | |
|
| [score_sde_vp](./api/pipelines/score_sde_vp) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | |
|
| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [Semantic Guidance](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | |
|
| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | |
|
| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | |
|
| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | |
|
| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [MultiDiffusion](https://multidiffusion.github.io/) | Text-to-Panorama Generation | |
|
| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800) | Text-Guided Image Editing| |
|
| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [Zero-shot Image-to-Image Translation](https://pix2pixzero.github.io/) | Text-Guided Image Editing | |
|
| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation | |
|
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation Unconditional Image Generation | |
|
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [Stable Diffusion Image Variations](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation | |
|
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [Stable Diffusion Latent Upscaler](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image | |
|
| [stable_diffusion_model_editing](./api/pipelines/stable_diffusion/model_editing) | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://time-diffusion.github.io/) | Text-to-Image Model Editing | |
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | |
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | |
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Depth-Conditional Stable Diffusion](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation | |
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | |
|
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [Safe Stable Diffusion](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | |
|
| [stable_unclip](./stable_unclip) | Stable unCLIP | Text-to-Image Generation | |
|
| [stable_unclip](./stable_unclip) | Stable unCLIP | Image-to-Image Text-Guided Generation | |
|
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | |
|
| [text_to_video_sd](./api/pipelines/text_to_video) | [Modelscope's Text-to-video-synthesis Model in Open Domain](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) | Text-to-Video Generation | |
|
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125)(implementation by [kakaobrain](https://github.com/kakaobrain/karlo)) | Text-to-Image Generation | |
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | |
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | |
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | |
|
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | |
|
|