whisper-small-khmer / README.md
seanghay's picture
Update README.md
df37a83
|
raw
history blame
3.12 kB
metadata
language:
  - km
license: apache-2.0
tags:
  - hf-asr-leaderboard
  - generated_from_trainer
datasets:
  - openslr
  - google/fleurs
metrics:
  - wer
model-index:
  - name: Whisper Small Khmer Spaced - Seanghay Yath
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Google FLEURS
          type: google/fleurs
          config: km_kh
          split: all
        metrics:
          - name: Wer
            type: wer
            value: 0.6464

whisper-small-khmer

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4657
  • Wer: 0.6464

Model description

This model is fine-tuned with Google FLEURS & OpenSLR (SLR42) dataset.

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="seanghay/whisper-small-khmer",
)

result = pipe("audio.wav",
  generate_kwargs={
    "language":"<|km|>",
    "task":"transcribe"},
    batch_size=16
)

print(result["text"])

whisper.cpp

1. Transcode the input audio to 16kHz PCM

ffmpeg -i audio.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav

2. Transcribe with whisper.cpp

./main -m ggml-model.bin -f output.wav --print-colors --language km

Training and evaluation data

  • training = google/fleurs['train+validation'] + openslr['train']
  • eval = google/fleurs['test']

Training procedure

This model was trained based on the project on GitHub with an NVIDIA A10 24GB.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6.25e-06
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 800
  • training_steps: 8000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2065 3.37 1000 0.3403 0.7929
0.0446 6.73 2000 0.2911 0.6961
0.008 10.1 3000 0.3578 0.6627
0.003 13.47 4000 0.3982 0.6564
0.0012 16.84 5000 0.4287 0.6512
0.0004 20.2 6000 0.4499 0.6419
0.0001 23.57 7000 0.4614 0.6469
0.0001 26.94 8000 0.4657 0.6464

Framework versions

  • Transformers 4.28.0.dev0
  • Pytorch 2.0.0+cu117
  • Datasets 2.11.1.dev0
  • Tokenizers 0.13.3