EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Zhiyuan Chen*Jiajiong Cao*Zhiquan ChenYuming LiChenguang Ma
*Equal Contribution.
Terminal Technology Department, Alipay, Ant Group.
## Model Files ``` ./pretrained_models/ ├── denoising_unet.pth ├── reference_unet.pth ├── motion_module.pth ├── face_locator.pth ├── sd-vae-ft-mse │ └── ... ├── sd-image-variations-diffusers │ └── ... └── audio_processor └── ``` Some models in this hub can be directly downloaded from it's original hub: - [sd-vae-ft-mse]( Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai]( - [sd-image-variations-diffusers]( - [audio_processor]( ## Gallery ### Audio Driven (Sing)
### Audio Driven (English)
### Audio Driven (Chinese)
### Landmark Driven
### Audio + Selected Landmark Driven
**(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.)** ## Citation If you find our work useful for your research, please consider citing the paper: ``` @misc{chen2024echomimic, title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning}, author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma}, year={2024}, eprint={2406.01900}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```