cdminix commited on
Commit
d8dc492
1 Parent(s): 6a53d44

update readme

Browse files
Files changed (1) hide show
  1. README.md +39 -33
README.md CHANGED
@@ -16,36 +16,42 @@ tags:
16
  - eval:generation
17
  ---
18
 
19
- # Start the configuration
20
-
21
- Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
22
-
23
- Results files should have the following format and be stored as json files:
24
- ```json
25
- {
26
- "config": {
27
- "model_name": "name of the model",
28
- "model_url": "url of the model",
29
- "tags": ["tag1", "tag2"], // e.g. ["flow", "diffusion", "autoregressive", "end-to-end"]
30
- },
31
- "results": {
32
- "task_name": {
33
- "metric_name": score,
34
- },
35
- "task_name2": {
36
- "metric_name": score,
37
- }
38
- }
39
- }
40
- ```
41
-
42
- Request files are created automatically by this tool.
43
-
44
- If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
45
-
46
- # Code logic for more complex edits
47
-
48
- You'll find
49
- - the main table' columns names and properties in `src/display/utils.py`
50
- - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
51
- - the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
 
 
 
 
 
 
 
16
  - eval:generation
17
  ---
18
 
19
+ # TTSDS Benchmark
20
+
21
+ As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
22
+ However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
23
+ Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
24
+ By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
25
+
26
+ ## More information
27
+ More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
28
+
29
+ ## Reproducibility
30
+ To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
31
+
32
+ ## Credits
33
+
34
+
35
+ This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
36
+ Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
37
+ Additionally, our benchmark uses the following datasets:
38
+ - [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
39
+ - [LibriTTS](https://www.openslr.org/60/)
40
+ - [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
41
+ - [Common Voice](https://commonvoice.mozilla.org/)
42
+ - [ESC-50](https://github.com/karolpiczak/ESC-50)
43
+ And the following metrics/representations/tools:
44
+ - [Wav2Vec2](https://arxiv.org/abs/2006.11477)
45
+ - [Hubert](https://arxiv.org/abs/2006.11477)
46
+ - [WavLM](https://arxiv.org/abs/2110.13900)
47
+ - [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
48
+ - [VoiceFixer](https://arxiv.org/abs/2204.05841)
49
+ - [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
50
+ - [Whisper](https://arxiv.org/abs/2212.04356)
51
+ - [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
52
+ - [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
53
+ - [WeSpeaker](https://arxiv.org/abs/2210.17016)
54
+ - [D-Vector](https://github.com/yistLin/dvector)
55
+
56
+ Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
57
+ of the University of Edinburgh.