InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5
2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5.
We use text-conditioned reflow as described in our paper.
Reflow has interesting theoretical properties. You may check this ICLR paper and this arXiv paper.
Images Generated from Random Diffusion DB prompts
We compare SD 1.5+DPM-Solver and 2-Rectified Flow with random prompts from Diffusion DB using the same random seeds. We observe that 2-Rectiifed Flow is straighter.
Usage
Please refer to the official github repo.
Training
Training pipeline:
- Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations. The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days)
- Reflow (Stage 2): We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days)
The final model is 2-Rectified Flow.
Total Training Cost: It takes 75.2 A100 GPU days to get 2-Rectified Flow.
Evaluation Results - Metrics
The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver:
FID-5k = 21.5, CLIP score = 0.315
Few-Step performance:
Evaluation Results - Impact of Guidance Scale
We evaluate the impact of the guidance scale on 2-Rectified Flow.
Trade-off Curve:
Citation
@article{liu2023insta,
title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
journal={arXiv preprint arXiv:2309.06380},
year={2023}
}
- Downloads last month
- 55,455