error Inference on for long-context text
#6
by
ZhangYuanhan
- opened
This is my demo code:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Initialize the tokenizer and model from the pretrained version on Hugging Face
tokenizer = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5-16k")
model = AutoModelForCausalLM.from_pretrained("lmsys/vicuna-7b-v1.5-16k")
# Prepare the text you want to infer on
text = "text" * 10000
inputs = tokenizer(text, return_tensors="pt", max_length=16384, truncation=True)
# Generate output using the model
with torch.no_grad():
outputs = model.generate(**inputs, max_length=16384, num_return_sequences=1)
# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
This is the error code:
padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)
RuntimeError: The size of tensor a (8192) must match the size of tensor b (10001) at non-singleton dimension 3
It seems that the maximum size of the causal mask is 8196:
https://github.com/huggingface/transformers/blob/0290ec19c901adc0f1230ebdccad11c40af026f5/src/transformers/models/llama/modeling_llama.py#L1079
env:
transformer: 4.38.2