---
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_12_0
language:
- fr
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---

training on full commonvoice

The WERs are:

| decoding method      | chunk size |  test  |        comment       |     decoding mode    |
| -------------------- | ---------- | ------ | -------------------- | -------------------- |
| greedy search        | 640ms      | 10.90  | --epoch 30 --avg 9   | simulated streaming  |
| modified beam search | 640ms      | 10.55  | --epoch 30 --avg 9   | simulated streaming  |
| fast beam search     | 640ms      | 10.75  | --epoch 30 --avg 9   | simulated streaming  |

training on full librispeech then finetune on full commonvoice

The WERs are:

| decoding method      | chunk size |  test  |        comment       |     decoding mode    |
| -------------------- | ---------- | ------ | -------------------- | -------------------- |
| greedy search        | 640ms      | 10.57  | --epoch 29 --avg 9   | simulated streaming  |
| modified beam search | 640ms      | 10.19  | --epoch 29 --avg 9   | simulated streaming  |
| fast beam search     | 640ms      | 10.25  | --epoch 29 --avg 9   | simulated streaming  |

training on full librispeech and gigaspeech then finetune on full commonvoice

The WERs are:

| decoding method      | chunk size |  test  |        comment       |     decoding mode    |
| -------------------- | ---------- | ------ | -------------------- | -------------------- |
| greedy search        | 640ms      | 9.95   | --epoch 30 --avg 9   | simulated streaming  |
| modified beam search | 640ms      | 9.57   | --epoch 30 --avg 9   | simulated streaming  |
| fast beam search     | 640ms      | 9.67   | --epoch 30 --avg 9   | simulated streaming  |