metadata

language:
  - ru
license: apache-2.0
tags:
  - hf-asr-leaderboard
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_11_0
metrics:
  - wer
base_model: openai/whisper-base
model-index:
  - name: whisper-base-fine_tuned-ru
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: common_voice_11_0
          type: mozilla-foundation/common_voice_11_0
          args: 'config: ru, split: test'
        metrics:
          - type: wer
            value: 41.216909250757055
            name: Wer

whisper-base-fine_tuned-ru

This model is a fine-tuned version of openai/whisper-base on the common_voice_11_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.4553
Wer: 41.2169

Model description

Same as original model (see whisper-base). But! This model has been fine-tuned for the task of transcribing the Russian language.

Intended uses & limitations

Same as original model (see whisper-base).

Training and evaluation data

More information needed

Training procedure

The model is fine-tuned using the following notebook (available only in the Russian version): https://github.com/blademoon/Whisper_Train

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 250
training_steps: 20000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.702	0.25	500	0.8245	71.6653
0.5699	0.49	1000	0.6640	55.7048
0.5261	0.74	1500	0.6127	50.6215
0.4997	0.98	2000	0.5834	47.4541
0.4681	1.23	2500	0.5638	46.6262
0.4651	1.48	3000	0.5497	47.5090
0.4637	1.72	3500	0.5379	46.5700
0.4185	1.97	4000	0.5274	45.3160
0.3856	2.22	4500	0.5205	45.5871
0.4078	2.46	5000	0.5122	45.7190
0.4132	2.71	5500	0.5066	45.5004
0.3914	2.96	6000	0.4998	47.0011
0.3822	3.2	6500	0.4959	44.9570
0.3596	3.45	7000	0.4916	45.5578
0.3877	3.69	7500	0.4870	45.2476
0.3687	3.94	8000	0.4832	45.2159
0.3514	4.19	8500	0.4809	46.0254
0.3202	4.43	9000	0.4779	48.1306
0.3229	4.68	9500	0.4751	45.5724
0.3285	4.93	10000	0.4717	45.9436
0.3286	5.17	10500	0.4705	45.0510
0.3294	5.42	11000	0.4689	47.2111
0.3384	5.66	11500	0.4666	47.3393
0.316	5.91	12000	0.4650	43.2536
0.2988	6.16	12500	0.4638	42.9789
0.3046	6.4	13000	0.4629	42.4331
0.2962	6.65	13500	0.4614	40.2437
0.3008	6.9	14000	0.4602	39.5734
0.2749	7.14	14500	0.4593	40.1497
0.3001	7.39	15000	0.4588	42.6248
0.3054	7.64	15500	0.4580	40.3707
0.2926	7.88	16000	0.4574	39.4232
0.2938	8.13	16500	0.4569	40.9532
0.3105	8.37	17000	0.4566	40.4379
0.2799	8.62	17500	0.4562	40.3622
0.2793	8.87	18000	0.4557	41.3451
0.2819	9.11	18500	0.4555	41.4184
0.2907	9.36	19000	0.4555	39.9348
0.3113	9.61	19500	0.4553	41.0289
0.2867	9.85	20000	0.4553	41.2169

Framework versions

Transformers 4.24.0
Pytorch 1.13.1
Datasets 2.7.1
Tokenizers 0.13.1