Transcription
Can you run transcription via inference api? It only translates to english with the provided code snipped
You should be able to! The trick is to properly set the ‘forced_decoder_ids’ with the processor. Otherwise we are looking into adding the la language detection automatically
You can set the forced_decoder_ids
as follows:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
checkpoint = "openai/whisper-large"
model = WhisperForConditionalGeneration.from_pretrained(checkpoint)
processor = WhisperProcessor.from_pretrained(checkpoint)
print("Default:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))
# now change to Hindi (hi)
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe")
print("\nHindi:")
print(model.config.forced_decoder_ids)
print(processor.batch_decode([i[1] for i in model.config.forced_decoder_ids]))
Print Output:
Default:
[[1, 50259], [2, 50359], [3, 50363]]
['<|en|>', '<|transcribe|>', '<|notimestamps|>']
Hindi:
[(1, 50276), (2, 50359), (3, 50363)]
['<|hi|>', '<|transcribe|>', '<|notimestamps|>']
@muertinho for now I switched to https://replicate.com/openai/whisper/api#input-audio since to me it seems that it is not possible with the currently provided inference api. Maybe you'd have to host your own.
@Spotex93 thank you for providing me with the link. I see the same issue you mentioned above and will look into replicate.com. @sanchit-gandhi thank you for your reply and efforts as well. However, I am looking for an option to run it over a cloud API.
@muertinho tell me if you found a better solution!