Spaces:
Running
on
CPU Upgrade
[Average WER Calculation] Drop Common Voice WER.
Hey hey!
Starting this discussion to discuss dropping CV WER from the overall calculation of the Avg. WER. The reason for removing it is as follows:
Common Voice (CV) does not maintain the integrity of train and test splits over each CV generation. Each new generation of CV essentially samples train, test, and validation splits randomly. This results in test data leakage for models trained on later generations of CV.
ref: https://discourse.mozilla.org/t/how-are-the-dev-test-train-datasets-split/36381/4
I recomputed the WER and you can see them in the below datasets:
Without CV: https://huggingface.co/datasets/reach-vb/open-asr-leaderboard-evals-ex-cv/viewer
With CV: https://huggingface.co/datasets/reach-vb/open-asr-leaderboard-evals-all
What do we think? @smajumdar94 @NithinK @sanchit-gandhi
Cheers!
VB
+1 agree on the above - it's impossible to guarantee no data leakage if we include the MCV series. Would be in favour removing this from the overall score, but maybe still keeping it as a column in the leaderboard with an asterisk to highlight that it's a leaked dataset
Would be in favour removing this from the overall score, but maybe still keeping it as a column in the leaderboard with an asterisk to highlight that it's a leaked dataset
I wonder if this will cause some confusion in the community?
True - unless it was a "hidden" column that you expanded out upon click, but I think even this is too complex. Happy to exclude it and add a note in the README explaining why the dataset was removed post-release
I would prefer to remove MCV numbers all together from leaderboard, as we are not using them for average wer calculation and ranking.