Reference of the Dataset

#1
by ChiaLingWeng - opened

Hi, I'm conducting a research related to Tao language, it would be helpful to implement this model, but I'm wondering the corpus using to train this model, as I tried multiple words and the result audio sounds wrong. (I personally know a little bit Tao)
Thanks!

See Section 3.1.1 of the paper: https://huggingface.co/papers/2305.13516

The MMS-lab dataset is based on recordings of people reading the New Testament in different languages. The New Testament consists of 27 books and a total of 260 chapters. Specifically, we obtain data from Faith Comes By Hearing6, goto.bible and bible.com. This includes the original text data as we well as the corresponding audio recording.

Sign up or log in to comment