doof-ferb
/

whisper-tiny-vi

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-tiny-vi / README.md

doof-ferb's picture

Update README.md

a7f8c3d verified 9 months ago

|

history blame contribute delete

2.41 kB

	---
	license: apache-2.0
	datasets:
	- doof-ferb/vlsp2020_vinai_100h
	- doof-ferb/fpt_fosd
	- doof-ferb/infore1_25hours
	- doof-ferb/infore2_audiobooks
	- quocanh34/viet_vlsp
	- linhtran92/final_dataset_500hrs_wer0
	- linhtran92/viet_youtube_asr_corpus_v2
	- google/fleurs
	- mozilla-foundation/common_voice_16_1
	- vivos
	language: ["vi"]
	metrics: ["wer"]
	library_name: transformers
	base_model: openai/whisper-tiny
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: doof-ferb/whisper-tiny-vi
	results:
	- task:
	type: automatic-speech-recognition
	dataset:
	type: mozilla-foundation/common_voice_16_1
	name: Mozilla CommonVoice (Vietnamese) v16.1
	config: vi
	split: test
	metrics:
	- type: wer
	value: 26.6
	verified: false
	- task:
	type: automatic-speech-recognition
	dataset:
	type: google/fleurs
	name: Google FLEURS (Vietnamese)
	config: vi_vn
	split: test
	metrics:
	- type: wer
	value: 37.1
	verified: false
	- task:
	type: automatic-speech-recognition
	dataset:
	type: vivos
	name: ĐHQG TPHCM VIVOS
	split: test
	metrics:
	- type: wer
	value: 18.7
	verified: false
	---

	whisper tiny fine-tuned on a very big collection of vietnamese speech datasets

	TODO:
	- [x] training then publish checkpoint
	- [x] evaluate WER on Common Voice & FLEURS & VIVOS
	- [ ] convert to `openai-whisper`, `whisper.cpp`, `faster-whisper`
	- [ ] convert to ONNX: to try https://github.com/k2-fsa/sherpa-onnx & https://github.com/zhuzilin/whisper-openvino
	- [ ] convert to TensorRT: https://github.com/openai/whisper/discussions/169

	21k steps, warm-up 5%, batch size 16×2 (kaggle free T4×2)

	manually evaluate WER on test set - vietnamese part:
	\| @ `float16` \| `CommonVoice v16.1` \| `FLEURS` \| `VIVOS` \|
	\|---\|---\|---\|---\|
	\| original `whisper-tiny` \| >100% \| 88.6% \| 62.5% \|
	\| this model \| 26.6% \| 37.1% \| 18.7% \|

	all training + evaluation scripts are on my repo: https://github.com/phineas-pta/fine-tune-whisper-vi

	usage example:
	```python
	import torch
	from transformers import pipeline

	PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
	PIPE_KWARGS = {"language": "vi", "task": "transcribe"}

	PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]
	```