NeMo
Safetensors
llama

size mismatch

#12
by Bleking - opened

Hi. I recently tried to import this model as the LLM of LLaVA-NeXT in order to finetune it. However, I constantly receive size mismatch issue such as:

size mismatch for model.layers.31.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 3072]) from checkpoint, the shape in current model is torch.Size([3072, 3072]).
size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 3072]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for model.layers.31.self_attn.o_proj.weight: copying a param with shape torch.Size([3072, 4096]) from checkpoint, the shape in current model is torch.Size([3072, 3072]).

So I tried to simply load the model and the tokenizer by it with this simple code:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("nvidia/Llama-3.1-Minitron-4B-Width-Base")
model = AutoModel.from_pretrained("nvidia/Llama-3.1-Minitron-4B-Width-Base")

print("Tokenizer vocab size:", tokenizer.vocab_size)
print("Model embedding size:", model.config.vocab_size)

but I still get the same error messages.

Is there a mismatch between the model and my virtual environment or something? I will provide you more information if you need one.

Thank you.

Sign up or log in to comment