Question about decoder_input_ids
#5
by
SuperXXX
- opened
When i use the example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset
import torch
# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
# load dummy dataset and read soundfiles
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
input_features = processor(ds[0]["audio"]["array"], return_tensors="pt").input_features
# Generate logits
logits = model(input_features, decoder_input_ids = torch.tensor([[50258]])).logits
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
I get result ['<|startoftranscript|>']
However, if I do
generated_ids = model.generate(inputs=input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(transcription)
I get result Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.
Is this expected? or how to I modify decoder_input_ids
to get te equal result?
The example on the README is for one forward pass. Your code snippet is correct for auto regressive generation!