Output in English
I was hoping to use this model when I speak in hindi and then get the translated english output as its ability to understand hindi is better than non fine-tuned whisper.
Changing decoder_prompt_ids to (language="en", task="translate")
had no affect and adding generate_kwargs = {"task":"translate", "language":"<|en|>"}
during inference stopped any inference.
What should I try?
Since this model has been fine-tuned for 57000 steps on about 3500 hours of data which is quite substantial, the model must have been biased completely towards generating Hindi tokens (which in a way is desired in this ASR fine-tuning) and is therefore not doing well on the translation task.
To obtain the English translation of a Hindi audio, you may try the 3 methods listed below:
- Set
decoder_prompt_ids to (language="hi", task="translate")
and use the openai's original whisper model. The accuracy is quite low though, from Hindi->English. - Using a speech translation model. There are not many good options though for Indian languages.
- Using a cascade system. Example, get the Hindi output from a good Automatic Speech Recognition (ASR) model (you could continue using this whisper-hindi-small for this). Pass the output of this to a machine translation model. The nllb from Facebook (https://huggingface.co/facebook/nllb-200-distilled-600M) is good enough from Hindi to English (has a 20-30% error). This according to me is a better option to get the English translation for a Hindi audio.
Thank you so much for the detailed reply. I'm looking to use it for a mixed "Hinglish" audio as well, openai's model stay better for that than the cascade system but the cascade works very well for pure hindi
ghje i kumaty