ali-vilab
/

text-to-video-ms-1.7b

TextToVideoSDPipeline

Model card Files Files and versions Community

dayoucdy commited on Mar 29, 2023

Commit

1e7b71d

•

1 Parent(s): d6b1e69

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -5,16 +5,16 @@ tags:
 duplicated_from: diffusers/text-to-video-ms-1.7b
 ---
 **We Are Hiring!** (Based in Beijing / Hangzhou, China.)
 If you're looking for an exciting challenge and the opportunity to work with cutting-edge technologies in AIGC and large-scale pretraining, then we are the place for you. We are looking for talented, motivated and creative individuals to join our team. If you are interested, please send your CV to us.
 EMAIL: [email protected]
-# Text-to-video-synthesis Model in Open Domain
-This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
 ## Model description
 The text-to-video generation diffusion model consists of three sub-networks: text feature extraction model, text feature-to-video latent space diffusion model, and video latent space to video visual space model. The overall model parameters are about 1.7 billion. Currently, it only supports English input. The diffusion model adopts a UNet3D structure, and implements video generation through the iterative denoising process from the pure Gaussian noise video.

 duplicated_from: diffusers/text-to-video-ms-1.7b
 ---
+# Text-to-video-synthesis Model in Open Domain
+This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
 **We Are Hiring!** (Based in Beijing / Hangzhou, China.)
 If you're looking for an exciting challenge and the opportunity to work with cutting-edge technologies in AIGC and large-scale pretraining, then we are the place for you. We are looking for talented, motivated and creative individuals to join our team. If you are interested, please send your CV to us.
 EMAIL: [email protected]
 ## Model description
 The text-to-video generation diffusion model consists of three sub-networks: text feature extraction model, text feature-to-video latent space diffusion model, and video latent space to video visual space model. The overall model parameters are about 1.7 billion. Currently, it only supports English input. The diffusion model adopts a UNet3D structure, and implements video generation through the iterative denoising process from the pure Gaussian noise video.