How to use this in colab?

#1
by alloc7260 - opened

I have tried with this code
speech_array, sampling_rate = librosa.load(audio_path, sr=16000)

input_values = tokenizer(speech_array, return_tensors="pt").input_values.to("cuda")

with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = tokenizer.batch_decode(predicted_ids)[0]

but it give me this
llenĠsignificaĠPassĠbandsĠNeedóÅĤлаÑĢbusĠHookĠrépondĠquestionnaireĠassembleĠtheoreticallyĠcientoODpersonalĠLiquidODà¸ĽĠsausagesĠfindingĠheavenĠ모ĠImmerĠrecognitionĠëª¨à¸ĽĠvalleyvialgyptĠatmosphere×Ļ×ĶĠanyhowĠодинĠdünyĠseleĠGeoffĠGrandeĆинаĠMuseumĠmerdeíķĻuestasÑĥкиĠprioritĠunwantedĠlawyersĠsoberplexvialè¡ĢĠbarkinghipsĠterrorismĠliquorĠAuroraеÑ

This is not I want.

The model cards have been updated with relevant information regarding their inference.
The code snippets in the model card should clarify this issue.
Do let me know if your issue persists.

You may also find the fine-tuning and evaluation scripts provided in the following repository useful.
https://github.com/vasistalodagala/whisper-finetune

Sign up or log in to comment