InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5

2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5.

We use text-conditioned reflow as described in our paper.

Reflow has interesting theoretical properties. You may check this ICLR paper and this arXiv paper.

Images Generated from Random Diffusion DB prompts

We compare SD 1.5+DPM-Solver and 2-Rectified Flow with random prompts from Diffusion DB using the same random seeds. We observe that 2-Rectiifed Flow is straighter.


Prompt: a renaissance portrait of dwayne johnson, art in the style of rembrandt.


Prompt: a photo of a rabbit head on a grizzly bear body.

Usage

Please refer to the official github repo.

Training

Training pipeline:

Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations. The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days)
Reflow (Stage 2): We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days)

The final model is 2-Rectified Flow.

Total Training Cost: It takes 75.2 A100 GPU days to get 2-Rectified Flow.

Evaluation Results - Metrics

The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver:

FID-5k = 21.5, CLIP score = 0.315

Few-Step performance:

Evaluation Results - Impact of Guidance Scale

We evaluate the impact of the guidance scale on 2-Rectified Flow.

Trade-off Curve:

Citation

@article{liu2023insta,
  title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
  author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
  journal={arXiv preprint arXiv:2309.06380},
  year={2023}
}