Reference of the Dataset
#1
by
ChiaLingWeng
- opened
Hi, I'm conducting a research related to Tao language, it would be helpful to implement this model, but I'm wondering the corpus using to train this model, as I tried multiple words and the result audio sounds wrong. (I personally know a little bit Tao)
Thanks!
See Section 3.1.1 of the paper: https://huggingface.co/papers/2305.13516
The MMS-lab dataset is based on recordings of people reading the New Testament in different languages. The New Testament consists of 27 books and a total of 260 chapters. Specifically, we obtain data from Faith Comes By Hearing6, goto.bible and bible.com. This includes the original text data as we well as the corresponding audio recording.