File size: 2,588 Bytes
cdc732c
 
2a79155
cdc732c
 
 
 
b151042
a928267
cdc732c
 
b873980
cdc732c
 
 
 
 
755219d
cdc732c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0653024
cdc732c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
pipeline_tag: text-to-video
---

## Latte: Latent Diffusion Transformer for Video Generation

This repo contains text-to-video generation pre-trained weights for our paper exploring latent diffusion models with transformers (Latte). You can find more visualizations on our [project page](https://maxin-cn.github.io/latte_project/).
If you want to obtain pre-trained weights on FaceForensics, SkyTimelapse, UCF101, and Taichi-HD, please refer to [here](https://huggingface.co/maxin-cn/Latte).

## News
- (πŸ”₯ New) May. 23, 2024. πŸ’₯ **Latte-1** for Text-to-video generation is released! You can download pre-trained model [here](https://huggingface.co/maxin-cn/LatteT2V/tree/main/transformer_v1). Latte-1 also supports Text-to-image generation, please run bash sample/t2i.sh.

- (πŸ”₯ New) Mar. 20, 2024. πŸ’₯ An updated LatteT2V model is coming soon, stay tuned!

- (πŸ”₯ New) Feb. 24, 2024. πŸ’₯ We are very grateful that researchers and developers like our work. We will continue to update our LatteT2V model, hoping that our efforts can help the community develop. Our Latte [discord](https://discord.gg/RguYqhVU92) channel is created for discussions. Coders are welcome to contribute.

- (πŸ”₯ New) Jan. 9, 2024. πŸ’₯ An updated LatteT2V model initialized with the [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha) is released, the checkpoint can be found [here](https://huggingface.co/maxin-cn/LatteT2V/tree/main/transformer).

- (πŸ”₯ New) Oct. 31, 2023. πŸ’₯ The training and inference code is released. All checkpoints (including FaceForensics, SkyTimelapse, UCF101, and Taichi-HD) can be found [here](https://huggingface.co/maxin-cn/Latte/tree/main). In addition, the LatteT2V inference code is provided.

## Contact Us
**Yaohui Wang**: [[email protected]](mailto:[email protected])
**Xin Ma**: [[email protected]](mailto:[email protected])

## Citation
If you find this work useful for your research, please consider citing it.
```bibtex
@article{ma2024latte,
  title={Latte: Latent Diffusion Transformer for Video Generation},
  author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Liu, Ziwei and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
  journal={arXiv preprint arXiv:2401.03048},
  year={2024}
}
```

Paper: https://huggingface.co/papers/2401.03048

## Acknowledgments
Latte has been greatly inspired by the following amazing works and teams: [DiT](https://github.com/facebookresearch/DiT) and [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha), we thank all the contributors for open-sourcing.