S3Diff Model Card

This model card focuses on the models associated with the S3Diff, available here.

Model Details

Developed by: Aiping Zhang
Model type: Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Model Description: This is the model used in Paper.
Resources for more information: GitHub Repository.

Cite as:

@article{2024s3diff,
  author    = {Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, Xiaochun Cao},
  title     = {Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors},
  journal   = {arxiv},
  year      = {2024},
}

Limitations and Bias

Limitations

S3Diff requires a tiled operation for generating a high-resolution image, which would largely increase the inference time.
S3Diff sometimes cannot keep 100% fidelity due to its generative nature.
S3Diff sometimes cannot generate perfect details under complex real-world scenarios.

Bias

While our model is based on a pre-trained SD-Turbo model, currently we do not observe obvious bias in generated results. We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images. Such strong conditions make our model less likely to be affected.

Training

Training Data The model developer used the following dataset for training the model:

Our model is finetuned on LSDIR + 100K samples from FFHQ datasets.

Training Procedure S3Diff is an image super-resolution model finetuned on SD-Turbo, further equipped with a degradation-guided LoRA and online negative prompting.

Following SD-Turbo, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
The LR images are fed to the degradation estimation network, trained by mm-realsr, to predict degradation scores.
We only inject LoRA layers into the VAE encoder and UNet.
The total loss includes an L2 Loss, an LPIPS loss, and a GAN loss.

We currently provide the following checkpoints:

s3diff.pkl: S3Diff finetuned on SD-Turbo for 30k iterations.
de_net.pth: The degradation estimation network, extracted from mm-realsr.

Evaluation Results

See Paper for details.