File size: 1,988 Bytes
18f7f3d 86d70f6 18f7f3d 8fff203 18f7f3d ff37640 18f7f3d eb4566c 18f7f3d c0c65d8 21faa72 c0c65d8 18f7f3d b8e5928 18f7f3d c0c65d8 c5231b4 18f7f3d c0c65d8 1ccb20f c0c65d8 454f7e7 c0c65d8 40b0d08 c0c65d8 9bd2de9 454f7e7 bcc264e af5ad36 83a351f 68f5bb7 c0c65d8 18f7f3d 8fff203 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
license: apache-2.0
tags:
- generated_from_trainer
model-index:
- name: wav2vec2-xls-r-phone-mfa_korean
results: []
language:
- ko
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# wav2vec2-xls-r-300m_phoneme-mfa_korean
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on a phonetically balanced native Korean read-speech corpus.
# Training and Evaluation Data
Training Data
- Data Name: Phonetically Balanced Native Korean Read-speech Corpus
- Num. of Samples: 54,000
- Audio Length: 108 Hours
Evaluation Data
- Data Name: Phonetically Balanced Native Korean Read-speech Corpus
- Num. of Samples: 6,000
- Audio Length: 12 Hours
# Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 20 (EarlyStopping: patience: 5 epochs max)
- mixed_precision_training: Native AMP
# Evaluation Result
Phone Error Rate 3.88%
# Output Examples
![output_examples](./output_examples.png)
# MFA-IPA Phoneset Tables
## Vowels
![mfa_ipa_chart_vowels](./mfa_ipa_chart_vowels.png)
## Consonants
![mfa_ipa_chart_consonants](./mfa_ipa_chart_consonants.png)
## Experimental Results
Official implementation of the paper (in review)
Major error patterns of L2 Korean speech from five different L1s: Chinese (ZH), Vietnamese (VI), Japanese (JP), Thai (TH), English (EN)
![Experimental Results](./ICPHS2023_table2.png)
# Framework versions
- Transformers 4.21.3
- Pytorch 1.12.1
- Datasets 2.4.0
- Tokenizers 0.12.1 |