license: openrail++
tags:
- text-to-video
- stable-diffusion
Try Hotshot-XL yourself here: https://www.hotshot.co
Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.
Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things:
- You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use.
- If you'd like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone's existing LORA usage/workflows :) See more here.
Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. See here for more info.
Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.
Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions here.
Source code is available at https://github.com/hotshotco/Hotshot-XL.
Model Description
- Developed by: Natural Synthetics Inc.
- Model type: Diffusion-based text-to-GIF generative model
- License: CreativeML Open RAIL++-M License
- Model Description: This is a model that can be used to generate and modify GIFs based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
- Resources for more information: Check out our GitHub Repository.
Limitations and Bias
Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.
Bias
While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.