whisper tiny fine-tuned on a very big collection of vietnamese speech datasets
TODO:
- training then publish checkpoint
- evaluate WER on Common Voice & FLEURS & VIVOS
- convert to
openai-whisper
,whisper.cpp
,faster-whisper
- convert to ONNX: to try https://github.com/k2-fsa/sherpa-onnx & https://github.com/zhuzilin/whisper-openvino
- convert to TensorRT: https://github.com/openai/whisper/discussions/169
21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)
manually evaluate WER on test set - vietnamese part:
@ float16 |
CommonVoice v16.1 |
FLEURS |
VIVOS |
---|---|---|---|
original whisper-tiny |
>100% | 88.6% | 62.5% |
this model | 26.6% | 37.1% | 18.7% |
all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi
usage example:
import torch
from transformers import pipeline
PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}
PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for doof-ferb/whisper-tiny-vi
Base model
openai/whisper-tinyDatasets used to train doof-ferb/whisper-tiny-vi
Evaluation results
- wer on Mozilla CommonVoice (Vietnamese) v16.1test set self-reported26.600
- wer on Google FLEURS (Vietnamese)test set self-reported37.100
- wer on ĐHQG TPHCM VIVOStest set self-reported18.700