File size: 3,013 Bytes
dac4ce8
 
 
 
 
 
 
 
 
 
 
 
0d33781
dac4ce8
 
 
b2db8dc
 
dac4ce8
 
1bece31
 
dac4ce8
 
 
 
 
 
 
b2db8dc
dac4ce8
 
 
b2db8dc
dac4ce8
 
 
 
 
7b29bf0
dac4ce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2db8dc
 
dac4ce8
 
 
 
1bece31
dac4ce8
 
 
 
 
 
 
0d33781
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
language:
- en
base_model: black-forest-labs/FLUX.1-dev
library_name: diffusers
tags:
- Text-to-Image
- FLUX
- Stable Diffusion
pipeline_tag: text-to-image
---

<div style="display: flex; justify-content: center; align-items: center;">
  <img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
  <img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
</div>

[中文版Readme](./README_ZH.md)

This repository provides a 8-step distilled lora for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) model released by AlimamaCreative Team.

# Description
This checkpoint is a 8-step distilled Lora, trained based on FLUX.1-dev model. We use a multi-head discriminator to improve the distill quality. Our model can be used for T2I, inpainting controlnet and other FLUX related models. The recommended guidance_scale=3.5 and lora_scale=1. Our Lower steps version will release later.

- Text-to-Image.

![](./images/T2I.png)

- With [alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta](https://huggingface.co/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta). Our distilled lora can be well adapted to the Inpainting controlnet, and the accelerated generated effect can follow the original output well.

![](./images/inpaint.png)

# How to use
## diffusers
This model can be used ditrectly with diffusers

```python
import torch
from diffusers.pipelines import FluxPipeline

model_id = "black-forest-labs/FLUX.1-dev"
adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha"

pipe = FluxPipeline.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16
)
pipe.to("cuda")

pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A DSLR photo of a shiny VW van that has a cityscape painted on it. A smiling sloth stands on grass in front of the van and is wearing a leather jacket, a cowboy hat, a kilt and a bowtie. The sloth is holding a quarterstaff and a big book."
image = pipe(
            prompt=prompt,
            guidance_scale=3.5,
            height=1024,
            width=1024,
            num_inference_steps=8,
            max_sequence_length=512).images[0]
```

## comfyui

- T2I turbo workflow: [click here](./workflows/t2I_flux_turbo.json)
- Inpainting controlnet turbo workflow: [click here](./workflows/alimama_flux_inpainting_turbo_8step.json)


# Training Details

The model is trained on 1M open source and internal sources images, with the aesthetic 6.3+ and resolution greater than 800. We use adversarial training to improve the quality. Our method fix the original FLUX.1-dev transformer as the discriminator backbone, and add multi heads to every transformer layer. We fix the guidance scale as 3.5 during training, and use the time shift as 3. 

Mixed precision: bf16

Learning rate: 2e-5

Batch size: 64

Image size: 1024x1024