|
# AudioCraft objective metrics
|
|
|
|
In addition to training losses, AudioCraft provides a set of objective metrics
|
|
for audio synthesis and audio generation. As these metrics may require
|
|
extra dependencies and can be costly to train, they are often disabled by default.
|
|
This section provides guidance for setting up and using these metrics in
|
|
the AudioCraft training pipelines.
|
|
|
|
## Available metrics
|
|
|
|
### Audio synthesis quality metrics
|
|
|
|
#### SI-SNR
|
|
|
|
We provide an implementation of the Scale-Invariant Signal-to-Noise Ratio in PyTorch.
|
|
No specific requirement is needed for this metric. Please activate the metric at the
|
|
evaluation stage with the appropriate flag:
|
|
|
|
**Warning:** We report the opposite of the SI-SNR, e.g. multiplied by -1. This is due to internal
|
|
details where the SI-SNR score can also be used as a training loss function, where lower
|
|
values should indicate better reconstruction. Negative values are such expected and a good sign! Those should be again multiplied by `-1` before publication :)
|
|
|
|
```shell
|
|
dora run <...> evaluate.metrics.sisnr=true
|
|
```
|
|
|
|
#### ViSQOL
|
|
|
|
We provide a Python wrapper around the ViSQOL [official implementation](https://github.com/google/visqol)
|
|
to conveniently run ViSQOL within the training pipelines.
|
|
|
|
One must specify the path to the ViSQOL installation through the configuration in order
|
|
to enable ViSQOL computations in AudioCraft:
|
|
|
|
```shell
|
|
# the first parameter is used to activate visqol computation while the second specify
|
|
# the path to visqol's library to be used by our python wrapper
|
|
dora run <...> evaluate.metrics.visqol=true metrics.visqol.bin=<path_to_visqol>
|
|
```
|
|
|
|
See an example grid: [Compression with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py)
|
|
|
|
To learn more about ViSQOL and how to build ViSQOL binary using bazel, please refer to the
|
|
instructions available in the [open source repository](https://github.com/google/visqol).
|
|
|
|
### Audio generation metrics
|
|
|
|
#### Frechet Audio Distance
|
|
|
|
Similarly to ViSQOL, we use a Python wrapper around the Frechet Audio Distance
|
|
[official implementation](https://github.com/google-research/google-research/tree/master/frechet_audio_distance)
|
|
in TensorFlow.
|
|
|
|
Note that we had to make several changes to the actual code in order to make it work.
|
|
Please refer to the [FrechetAudioDistanceMetric](../audiocraft/metrics/fad.py) class documentation
|
|
for more details. We do not plan to provide further support in obtaining a working setup for the
|
|
Frechet Audio Distance at this stage.
|
|
|
|
```shell
|
|
# the first parameter is used to activate FAD metric computation while the second specify
|
|
# the path to FAD library to be used by our python wrapper
|
|
dora run <...> evaluate.metrics.fad=true metrics.fad.bin=<path_to_google_research_repository>
|
|
```
|
|
|
|
See an example grid: [Evaluation with FAD](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py)
|
|
|
|
#### Kullback-Leibler Divergence
|
|
|
|
We provide a PyTorch implementation of the Kullback-Leibler Divergence computed over the probabilities
|
|
of the labels obtained by a state-of-the-art audio classifier. We provide our implementation of the KLD
|
|
using the [PaSST classifier](https://github.com/kkoutini/PaSST).
|
|
|
|
In order to use the KLD metric over PaSST, you must install the PaSST library as an extra dependency:
|
|
```shell
|
|
pip install 'git+https://github.com/kkoutini/[email protected]#egg=hear21passt'
|
|
```
|
|
|
|
Then similarly, you can use the metric activating the corresponding flag:
|
|
|
|
```shell
|
|
# one could extend the kld metric with additional audio classifier models that can then be picked through the configuration
|
|
dora run <...> evaluate.metrics.kld=true metrics.kld.model=passt
|
|
```
|
|
|
|
#### Text consistency
|
|
|
|
We provide a text-consistency metric, similarly to the MuLan Cycle Consistency from
|
|
[MusicLM](https://arxiv.org/pdf/2301.11325.pdf) or the CLAP score used in
|
|
[Make-An-Audio](https://arxiv.org/pdf/2301.12661v1.pdf).
|
|
More specifically, we provide a PyTorch implementation of a Text consistency metric
|
|
relying on a pre-trained [Contrastive Language-Audio Pretraining (CLAP)](https://github.com/LAION-AI/CLAP).
|
|
|
|
Please install the CLAP library as an extra dependency prior to using the metric:
|
|
```shell
|
|
pip install laion_clap
|
|
```
|
|
|
|
Then similarly, you can use the metric activating the corresponding flag:
|
|
|
|
```shell
|
|
# one could extend the text consistency metric with additional audio classifier models that can then be picked through the configuration
|
|
dora run ... evaluate.metrics.text_consistency=true metrics.text_consistency.model=clap
|
|
```
|
|
|
|
Note that the text consistency metric based on CLAP will require the CLAP checkpoint to be
|
|
provided in the configuration.
|
|
|
|
#### Chroma cosine similarity
|
|
|
|
Finally, as introduced in MusicGen, we provide a Chroma Cosine Similarity metric in PyTorch.
|
|
No specific requirement is needed for this metric. Please activate the metric at the
|
|
evaluation stage with the appropriate flag:
|
|
|
|
```shell
|
|
dora run ... evaluate.metrics.chroma_cosine=true
|
|
```
|
|
|
|
#### Comparing against reconstructed audio
|
|
|
|
For all the above audio generation metrics, we offer the option to compute the metric on the reconstructed audio
|
|
fed in EnCodec instead of the generated sample using the flag `<metric>.use_gt=true`.
|
|
|
|
## Example usage
|
|
|
|
You will find example of configuration for the different metrics introduced above in:
|
|
* The [musicgen's default solver](../config/solver/musicgen/default.yaml) for all audio generation metrics
|
|
* The [compression's default solver](../config/solver/compression/default.yaml) for all audio synthesis metrics
|
|
|
|
Similarly, we provide different examples in our grids:
|
|
* [Evaluation with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py)
|
|
* [Evaluation with FAD and others](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py)
|
|
|