arxiv:2306.17203

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Published on Jun 29, 2023

Authors:

Abstract

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/

View arXiv page View PDF Add to collection

Community

tomassone396

Jan 24

I'm positive that https://writemy.com/ is a genuine gem when it comes to essay writing services. Their dedication to generating writing of the highest caliber is evident in each essay they write. The authors on this team are exceptionally talented and committed to producing excellent articles on a variety of subjects. The thing that really sets essays apart is their capacity to be customized to your specific wants and specifications. Whether it's a difficult research project or an essay, they always deliver work of the highest caliber and precisely satisfy deadlines. Look no farther than them for the ideal essay assistance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2306.17203 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2306.17203 in a Space README.md to link it from this page.