Evaluation Result ?
Can you share evaluation result on this model?
Hi
@kobkrit
, the CTranslate format shouldn't degrade the WER from the base model in most cases, so the evaluation is theoretically close to the base model.
However, WhisperX uses VAD preprocessing, so the results might be better. Thus, I conducted my own evaluation on commonvoice 11.
The WER is around 0.66 (no space) and around 1.05 ( raw output), whisper-th-large is around 15 Which is better than I expected π
(I will run the evaluation again in the future without using combined_text and will let you know once it's done.)
Here's how you can replicate the evaluation , You can modify the code to support batching,
but due to the speed of this model, it took me just around 30 minutes for a 10K test set by simple loop process.
import time
from datasets import load_dataset
from evaluate import load
from tqdm import tqdm
import whisperx
# Load the Whisper model
device = "cuda"
batch_size = 16
compute_type = "float16"
model = whisperx.load_model("Thaweewat/whisper-th-large-ct2", device, compute_type=compute_type)
# Load the WER metric and the dataset
wer = load("wer")
test_dataset = load_dataset("mozilla-foundation/common_voice_11_0", "th", split="test")
# Initialize lists for predictions and references
predictions = []
references = []
# Process each audio file
for sample in tqdm(test_dataset, desc="Processing Audio Files"):
audio_file = sample['path']
ground_truth = sample['sentence']
# Load and transcribe audio
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size, language='th')
# Combine text for evaluation
combined_text = ' '.join(segment['text'] for segment in result['segments'])
predictions.append(combined_text)
references.append(ground_truth)
# Remove spaces from each prediction & Output the WER score
predictions_no_space = [text.replace(" ", "") for text in predictions]
wer_score_no_space = wer.compute(predictions=predictions_no_space, references=references)
wer_score = wer.compute(predictions=predictions, references=references)
print(f"Word Error Rate: {wer_score}")
print(f"Word Error Rate: {wer_score_no_space}")
Superb!!!!!!!!