Thaweewat/whisper-th-large-ct2 · Evaluation Result ?

kobkrit

Jan 6

Can you share evaluation result on this model?

Thaweewat

Owner Jan 6

•

edited Jan 6

Hi @kobkrit , the CTranslate format shouldn't degrade the WER from the base model in most cases, so the evaluation is theoretically close to the base model.
However, WhisperX uses VAD preprocessing, so the results might be better. Thus, I conducted my own evaluation on commonvoice 11.
The WER is around 0.66 (no space) and around 1.05 ( raw output), whisper-th-large is around 15 Which is better than I expected 👍
(I will run the evaluation again in the future without using combined_text and will let you know once it's done.)

Here's how you can replicate the evaluation , You can modify the code to support batching,
but due to the speed of this model, it took me just around 30 minutes for a 10K test set by simple loop process.

import time
from datasets import load_dataset 
from evaluate import load 
from tqdm import tqdm  
import whisperx 

# Load the Whisper model
device = "cuda"  
batch_size = 16  
compute_type = "float16"  
model = whisperx.load_model("Thaweewat/whisper-th-large-ct2", device, compute_type=compute_type)

# Load the WER metric and the dataset
wer = load("wer")
test_dataset = load_dataset("mozilla-foundation/common_voice_11_0", "th", split="test")

# Initialize lists for predictions and references
predictions = []
references = []

# Process each audio file
for sample in tqdm(test_dataset, desc="Processing Audio Files"): 
    audio_file = sample['path']
    ground_truth = sample['sentence']

    # Load and transcribe audio
    audio = whisperx.load_audio(audio_file)
    result = model.transcribe(audio, batch_size=batch_size, language='th')

    # Combine text for evaluation
    combined_text = ' '.join(segment['text'] for segment in result['segments'])
    predictions.append(combined_text)
    references.append(ground_truth)

# Remove spaces from each prediction & Output the WER score 
predictions_no_space = [text.replace(" ", "") for text in predictions]
wer_score_no_space = wer.compute(predictions=predictions_no_space, references=references)
wer_score = wer.compute(predictions=predictions, references=references)
print(f"Word Error Rate: {wer_score}")
print(f"Word Error Rate: {wer_score_no_space}")

Thaweewat changed discussion status to closed Jan 6

kobkrit

Jan 6

Superb!!!!!!!!