Reference of the Dataset

by ChiaLingWeng - opened Mar 10

Mar 10

Hi, I'm conducting a research related to Tao language, it would be helpful to implement this model, but I'm wondering the corpus using to train this model, as I tried multiple words and the result audio sounds wrong. (I personally know a little bit Tao)
Thanks!

sanchit-gandhi

Mar 11

See Section 3.1.1 of the paper: https://huggingface.co/papers/2305.13516

The MMS-lab dataset is based on recordings of people reading the New Testament in different languages. The New Testament consists of 27 books and a total of 260 chapters. Specifically, we obtain data from Faith Comes By Hearing6, goto.bible and bible.com. This includes the original text data as we well as the corresponding audio recording.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment