RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440

#23
by WajihUllahBaig - opened

I have been trying to use the example, so far I have ended up with the following error

File ~/anaconda3/envs/triton/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:261 in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)

RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440

The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.

Thanks. Seemed to be the problem

WajihUllahBaig changed discussion status to closed

How to slove it

How to slove it

The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.

I upgrade transformer 4.31.0 ,but didn't slove

and one strange problem , 7b or 13b can work ,but 70B failed

have the same issue with the 70B version of models

You also need python>=3.8 to address this issue.

Same issue (but on Llama-3-8B model)
python=3.9 and transformers==4.41.0 don't work :/

Any Solution ?

model: 'meta-llama/Meta-Llama-3-8B-Instruct'
using Tesla K8
Cuda 11.6 Nvidia 470 drivers
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0+cu116 -f https://download.pytorch.org/whl/cu116/torch_stable.html
pip install -r requirements.txt
requirements.txt:
transformers==4.31.0 # For working with Meta LLaMA and BitsAndBytesConfig
accelerate==0.21.0 # For multi-GPU handling and model acceleration
bitsandbytes==0.38.1 # For 8-bit quantization
scipy==1.9.3

It works fine.

Sign up or log in to comment