File size: 1,784 Bytes
18f7f3d 86d70f6 18f7f3d 8fff203 18f7f3d ff37640 18f7f3d 7471b79 eb4566c 18f7f3d c0c65d8 21faa72 c0c65d8 18f7f3d b8e5928 18f7f3d c0c65d8 c5231b4 18f7f3d c0c65d8 1ccb20f c0c65d8 454f7e7 c0c65d8 40b0d08 c0c65d8 9bd2de9 454f7e7 c0c65d8 18f7f3d 8fff203 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
license: apache-2.0
tags:
- generated_from_trainer
model-index:
- name: wav2vec2-xls-r-phone-mfa_korean
results: []
language:
- ko
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# wav2vec2-xls-r-300m_phoneme-mfa_korean
Creator & Uploader: Jooyoung Lee ([email protected])
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on a phonetically balanced native Korean read-speech corpus.
# Training and Evaluation Data
Training Data
- Data Name: Phonetically Balanced Native Korean Read-speech Corpus
- Num. of Samples: 54,000
- Audio Length: 108 Hours
Evaluation Data
- Data Name: Phonetically Balanced Native Korean Read-speech Corpus
- Num. of Samples: 6,000
- Audio Length: 12 Hours
# Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 20 (EarlyStopping: patience: 5 epochs max)
- mixed_precision_training: Native AMP
# Evaluation Result
Phone Error Rate 3.88%
# Output Examples
![output_examples](./output_examples.png)
# MFA-IPA Phoneset Tables
## Vowels
![mfa_ipa_chart_vowels](./mfa_ipa_chart_vowels.png)
## Consonants
![mfa_ipa_chart_consonants](./mfa_ipa_chart_consonants.png)
# Framework versions
- Transformers 4.21.3
- Pytorch 1.12.1
- Datasets 2.4.0
- Tokenizers 0.12.1 |