Kandinsky 3
Kandinsky 3 is created by Vladimir Arkhipkin,Anastasia Maltseva,Igor Pavlov,Andrei Filatov,Arseniy Shakhmatov,Andrey Kuznetsov,Denis Dimitrov, Zein Shaheen
The description from it's GitHub page:
Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.
Its architecture includes 3 main components:
- FLAN-UL2, which is an encoder decoder model based on the T5 architecture.
- New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters.
- Sber-MoVQGAN is a decoder proven to have superior results in image restoration.
The original codebase can be found at ai-forever/Kandinsky-3.
Check out the Kandinsky Community organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.
Make sure to check out the schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.
Kandinsky3Pipeline
[[autodoc]] Kandinsky3Pipeline - all - call
Kandinsky3Img2ImgPipeline
[[autodoc]] Kandinsky3Img2ImgPipeline - all - call