|
--- |
|
datasets: |
|
- mozilla-foundation/common_voice_16_1 |
|
language: |
|
- ta |
|
metrics: |
|
- wer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
This model is fine-tuned on the Tamil dataset from Common Voice 16.1, preprocessed using Epitran for transliterating text into IPA. The 'tam-Taml' code was employed to generate a precise phoneme list, crucial for capturing the nuances of Tamil phonetics: |
|
|
|
* Vowels: |
|
* Monophthongs:'a', 'aː', 'e', 'eː', 'i', 'iː', 'o', 'oː', 'u', 'uː' |
|
* Diphthongs: 'aj', 'aʋ' |
|
|
|
* Consonants: |
|
* Nasals: 'm', 'n̪', 'n', 'ɳ', 'ɲ', 'ŋ' |
|
* Stops: 'p', 't̪', 'ʈ', 'k', |
|
* Affricates: 't͡ʃ', 'd͡ʒ' |
|
* Fricatives: 's', 'ʂ', 'ʃ', 'h' |
|
* Tap: 'ɾ' |
|
* Trill: 'r' |
|
* Approximants: 'ʋ','ɻ', 'j', 'l', 'ɭ' |
|
* Consonant cluster: 'kʂ' |
|
* Special Symbols: '்' (denotes the absence of inherent vowel) |