cdminix commited on
Commit
cd25c30
1 Parent(s): 27a8f27
Files changed (1) hide show
  1. src/texts.py +34 -2
src/texts.py CHANGED
@@ -1,11 +1,43 @@
1
  LLM_BENCHMARKS_TEXT = f"""
2
- ## How it works
3
- Check out our website [here](https://ttsdsbenchmark.com).
 
 
 
 
 
 
4
  More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
5
 
6
  ## Reproducibility
7
  To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  """
10
 
11
  EVALUATION_QUEUE_TEXT = """
 
1
  LLM_BENCHMARKS_TEXT = f"""
2
+ # About
3
+
4
+ As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
5
+ However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
6
+ Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
7
+ By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
8
+
9
+ ## More information
10
  More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
11
 
12
  ## Reproducibility
13
  To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
14
 
15
+ ## Credits
16
+
17
+
18
+ This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
19
+ Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
20
+ Additionally, our benchmark uses the following datasets:
21
+ - [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
22
+ - [LibriTTS](https://www.openslr.org/60/)
23
+ - [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
24
+ - [Common Voice](https://commonvoice.mozilla.org/)
25
+ - [ESC-50](https://github.com/karolpiczak/ESC-50)
26
+ And the following metrics/representations/tools:
27
+ - [Wav2Vec2](https://arxiv.org/abs/2006.11477)
28
+ - [Hubert](https://arxiv.org/abs/2006.11477)
29
+ - [WavLM](https://arxiv.org/abs/2110.13900)
30
+ - [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
31
+ - [VoiceFixer](https://arxiv.org/abs/2204.05841)
32
+ - [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
33
+ - [Whisper](https://arxiv.org/abs/2212.04356)
34
+ - [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
35
+ - [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
36
+ - [WeSpeaker](https://arxiv.org/abs/2210.17016)
37
+ - [D-Vector](https://github.com/yistLin/dvector)
38
+
39
+ Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
40
+ of the University of Edinburgh.
41
  """
42
 
43
  EVALUATION_QUEUE_TEXT = """