Spaces:

ttsds
/

benchmark

Running

App Files Files Community

cdminix commited on Jul 18

Commit

d8dc492

•

1 Parent(s): 6a53d44

update readme

Browse files

Files changed (1) hide show

README.md +39 -33

README.md CHANGED Viewed

@@ -16,36 +16,42 @@ tags:
   - eval:generation
 ---
-# Start the configuration
-Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
-Results files should have the following format and be stored as json files:
-```json
-{
-    "config": {
-        "model_name": "name of the model",
-        "model_url": "url of the model",
-        "tags": ["tag1", "tag2"], // e.g. ["flow", "diffusion", "autoregressive", "end-to-end"]
-    },
-    "results": {
-        "task_name": {
-            "metric_name": score,
-        },
-        "task_name2": {
-            "metric_name": score,
-        }
-    }
-}
-```
-Request files are created automatically by this tool.
-If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
-# Code logic for more complex edits
-You'll find
-- the main table' columns names and properties in `src/display/utils.py`
-- the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
-- the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`

   - eval:generation
 ---
+# TTSDS Benchmark
+As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
+However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
+Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
+By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
+## More information
+More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
+## Reproducibility
+To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
+## Credits
+This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
+Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
+Additionally, our benchmark uses the following datasets:
+- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
+- [LibriTTS](https://www.openslr.org/60/)
+- [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
+- [Common Voice](https://commonvoice.mozilla.org/)
+- [ESC-50](https://github.com/karolpiczak/ESC-50)
+And the following metrics/representations/tools:
+- [Wav2Vec2](https://arxiv.org/abs/2006.11477)
+- [Hubert](https://arxiv.org/abs/2006.11477)
+- [WavLM](https://arxiv.org/abs/2110.13900)
+- [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
+- [VoiceFixer](https://arxiv.org/abs/2204.05841)
+- [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
+- [Whisper](https://arxiv.org/abs/2212.04356)
+- [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
+- [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
+- [WeSpeaker](https://arxiv.org/abs/2210.17016)
+- [D-Vector](https://github.com/yistLin/dvector)
+Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
+of the University of Edinburgh.