metadata

license: openrail++
tags:
  - text-to-video
  - stable-diffusion

Try Hotshot-XL yourself here: https://www.hotshot.co

Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.

Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things:

You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use.
If you'd like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone's existing LORA usage/workflows :) See more here.

Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. See here for more info.

Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.

Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions here.

Source code is available at https://github.com/hotshotco/Hotshot-XL.

Model Description

Developed by: Natural Synthetics Inc.
Model type: Diffusion-based text-to-GIF generative model
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify GIFs based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
Resources for more information: Check out our GitHub Repository.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism
The model cannot render legible text
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
Faces and people in general may not be generated properly.

Bias

While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.