Abstract
We describe a large-scale dataset--{\em DeepSpeak}--of real and deepfake footage of people talking and gesturing in front of their webcams. The real videos in this first version of the dataset consist of 9 hours of footage from 220 diverse individuals. Constituting more than 25 hours of footage, the fake videos consist of a range of different state-of-the-art face-swap and lip-sync deepfakes with natural and AI-generated voices. We expect to release future versions of this dataset with different and updated deepfake technologies. This dataset is made freely available for research and non-commercial uses; requests for commercial use will be considered.
Community
The DeepSpeak dataset contains over 43 hours of real and deepfake footage of people talking and gesturing in front of their webcams. The source data was collected from a diverse set of participants in their natural environments and the deepfakes were generated using state-of-the-art open-source lip-sync and face-swap software. The dataset is available to the digital forensics research community via Hugging Face Datasets.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GM-DF: Generalized Multi-Scenario Deepfake Detection (2024)
- The Tug-of-War Between Deepfake Generation and Detection (2024)
- Exploring the Impact of Moire Pattern on Deepfake Detectors (2024)
- COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper