|
--- |
|
license: other |
|
license_name: flux-1-dev-non-commercial-license |
|
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md |
|
language: |
|
- en |
|
base_model: black-forest-labs/FLUX.1-dev |
|
library_name: diffusers |
|
tags: |
|
- Text-to-Image |
|
- FLUX |
|
- Stable Diffusion |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
<div style="display: flex; justify-content: center; align-items: center;"> |
|
<img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;"> |
|
<img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;"> |
|
</div> |
|
|
|
[中文版Readme](./README_ZH.md) |
|
|
|
This repository provides a 8-step distilled lora for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) model released by AlimamaCreative Team. |
|
|
|
# Description |
|
This checkpoint is a 8-step distilled Lora, trained based on FLUX.1-dev model. We use a multi-head discriminator to improve the distill quality. Our model can be used for T2I, inpainting controlnet and other FLUX related models. The recommended guidance_scale=3.5 and lora_scale=1. Our Lower steps version will release later. |
|
|
|
- Text-to-Image. |
|
|
|
![](./images/T2I.png) |
|
|
|
- With [alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta](https://huggingface.co/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta). Our distilled lora can be well adapted to the Inpainting controlnet, and the accelerated generated effect can follow the original output well. |
|
|
|
![](./images/inpaint.png) |
|
|
|
# How to use |
|
## diffusers |
|
This model can be used ditrectly with diffusers |
|
|
|
```python |
|
import torch |
|
from diffusers.pipelines import FluxPipeline |
|
|
|
model_id = "black-forest-labs/FLUX.1-dev" |
|
adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha" |
|
|
|
pipe = FluxPipeline.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16 |
|
) |
|
pipe.to("cuda") |
|
|
|
pipe.load_lora_weights(adapter_id) |
|
pipe.fuse_lora() |
|
|
|
prompt = "A DSLR photo of a shiny VW van that has a cityscape painted on it. A smiling sloth stands on grass in front of the van and is wearing a leather jacket, a cowboy hat, a kilt and a bowtie. The sloth is holding a quarterstaff and a big book." |
|
image = pipe( |
|
prompt=prompt, |
|
guidance_scale=3.5, |
|
height=1024, |
|
width=1024, |
|
num_inference_steps=8, |
|
max_sequence_length=512).images[0] |
|
``` |
|
|
|
## comfyui |
|
|
|
- T2I turbo workflow: [click here](./workflows/t2I_flux_turbo.json) |
|
- Inpainting controlnet turbo workflow: [click here](./workflows/alimama_flux_inpainting_turbo_8step.json) |
|
|
|
|
|
# Training Details |
|
|
|
The model is trained on 1M open source and internal sources images, with the aesthetic 6.3+ and resolution greater than 800. We use adversarial training to improve the quality. Our method fix the original FLUX.1-dev transformer as the discriminator backbone, and add multi heads to every transformer layer. We fix the guidance scale as 3.5 during training, and use the time shift as 3. |
|
|
|
Mixed precision: bf16 |
|
|
|
Learning rate: 2e-5 |
|
|
|
Batch size: 64 |
|
|
|
Image size: 1024x1024 |