Phenaki: Variable Length Video Generation From Open Domain Textual Description
Paper
•
2210.02399
•
Published
•
3
MaskGiT is trained to reconstruct masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given prompt p0