metadata

language:
  - fa
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_11_0
metrics:
  - wer
base_model: openai/whisper-small
model-index:
  - name: whisper_small-fa_v03
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: mozilla-foundation/common_voice_11_0 fa
          type: mozilla-foundation/common_voice_11_0
          config: fa
          split: test
          args: fa
        metrics:
          - type: wer
            value: 27.1515
            name: Wer

whisper_small-fa_v03

This model is a fine-tuned version of openai/whisper-small on the mozilla-foundation/common_voice_11_0 fa dataset. We also did data augmentation using audiomentations library along with hyperparameter tuning to acquire the best parameters. It achieves the following results on the evaluation set:

Loss: 0.1813
Wer: 23.1451

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

You can Find the notebooks here.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6.15044e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
500	1.210100	0.439317	44.17001
1000	0.717500	0.385981	40.53219
1500	0.585800	0.312391	35.52059
2000	0.508400	0.274010	31.00885
2500	0.443500	0.244815	29.79515
3000	0.392700	0.216328	27.24362
3500	0.340100	0.213681	26.00705
4000	0.236700	0.198893	28.51612
4500	0.212000	0.186622	25.88944
5000	0.183800	0.181340	23.14515

Framework versions

Transformers 4.26.0
Pytorch 2.0.1+cu117
Datasets 2.8.0
Tokenizers 0.13.3