Missing key(s) in state_dict
I get multiple size mismatch errors while trying to load the model. To my knowledge, the alpaca-native model was finetuned from the llama-13b model (given its file size of about 24 GB). Yet the model name in this repo has "alpaca7b" in it. Where can I find the original 7b alpaca model?
It is the same size as point-alpaca's weights when I applied the diffs so it's definitly alpaca-7B.
I'm not an expert but it seems that the issue lies between GPTQ-for-llama and alpaca which is llama finetuned and pruned because I've seen a warning about a lenght missmatch when I tried to quantize it myself (it aborted since the free colab ran out of ram unfortunately).
I tried quantizing this model as well and it's not possible because of the embedded size when it was trained. Would require modifying keys to the appropriate values before converting or it won't quant correctly. LoRA seems to only convert fine at the moment without any extra work.
As stated on the model card, this was quantized from the fine-tuned 7b model at chavinlo/alpaca-native @cecc16dc15544ee626ae3dfb9dfc5cea8851cf1e. The original alpaca-native model is available there.
Did you actually test this after quantizing it?
Yes. The inference script in the quant repo provided coherent results.
Here's an example invocation:
└─$ CUDA_VISIBLE_DEVICES=0 python llama_inference.py /home/me/GPT/text-generation-webui/models/alpaca-7b --wbits 4 --load /home/me/GPT/text-generation-webui/models/alpaca-7b/alpaca-7b-4bit.pt --max_length 300 --text "$(cat test_prompt.txt)"
Loading model ...
Traceback (most recent call last):
File "/home/me/GPT/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference.py", line 108, in
model = load_quant(args.model, args.load, args.wbits)
File "/home/me/GPT/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference.py", line 52, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "/home/me/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.up_proj.zeros", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.up_proj.zeros", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.up_proj.zeros", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.up_proj.zeros", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.up_proj.zeros", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.up_proj.zeros", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.up_proj.zeros", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.up_proj.zeros", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.up_proj.zeros", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.up_proj.zeros", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.up_proj.zeros", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.up_proj.zeros", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.up_proj.zeros", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.up_proj.zeros", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.up_proj.zeros", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.up_proj.zeros", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.up_proj.zeros", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.up_proj.zeros", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.up_proj.zeros", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.up_proj.zeros", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.up_proj.zeros", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.up_proj.zeros", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.up_proj.zeros", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.up_proj.zeros", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.up_proj.zeros", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.up_proj.zeros", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.up_proj.zeros", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.up_proj.zeros", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.up_proj.zeros", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.up_proj.zeros", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.up_proj.zeros", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.up_proj.zeros".
Unexpected key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.mlp.gate_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", "model.layers.7.mlp.up_proj.qzeros", "model.layers.8.self_attn.q_proj.qzeros", "model.layers.8.self_attn.k_proj.qzeros", "model.layers.8.self_attn.v_proj.qzeros", "model.layers.8.self_attn.o_proj.qzeros", "model.layers.8.mlp.gate_proj.qzeros", "model.layers.8.mlp.down_proj.qzeros", "model.layers.8.mlp.up_proj.qzeros", "model.layers.9.self_attn.q_proj.qzeros", "model.layers.9.self_attn.k_proj.qzeros", "model.layers.9.self_attn.v_proj.qzeros", "model.layers.9.self_attn.o_proj.qzeros", "model.layers.9.mlp.gate_proj.qzeros", "model.layers.9.mlp.down_proj.qzeros", "model.layers.9.mlp.up_proj.qzeros", "model.layers.10.self_attn.q_proj.qzeros", "model.layers.10.self_attn.k_proj.qzeros", "model.layers.10.self_attn.v_proj.qzeros", "model.layers.10.self_attn.o_proj.qzeros", "model.layers.10.mlp.gate_proj.qzeros", "model.layers.10.mlp.down_proj.qzeros", "model.layers.10.mlp.up_proj.qzeros", "model.layers.11.self_attn.q_proj.qzeros", "model.layers.11.self_attn.k_proj.qzeros", "model.layers.11.self_attn.v_proj.qzeros", "model.layers.11.self_attn.o_proj.qzeros", "model.layers.11.mlp.gate_proj.qzeros", "model.layers.11.mlp.down_proj.qzeros", "model.layers.11.mlp.up_proj.qzeros", "model.layers.12.self_attn.q_proj.qzeros", "model.layers.12.self_attn.k_proj.qzeros", "model.layers.12.self_attn.v_proj.qzeros", "model.layers.12.self_attn.o_proj.qzeros", "model.layers.12.mlp.gate_proj.qzeros", "model.layers.12.mlp.down_proj.qzeros", "model.layers.12.mlp.up_proj.qzeros", "model.layers.13.self_attn.q_proj.qzeros", "model.layers.13.self_attn.k_proj.qzeros", "model.layers.13.self_attn.v_proj.qzeros", "model.layers.13.self_attn.o_proj.qzeros", "model.layers.13.mlp.gate_proj.qzeros", "model.layers.13.mlp.down_proj.qzeros", "model.layers.13.mlp.up_proj.qzeros", "model.layers.14.self_attn.q_proj.qzeros", "model.layers.14.self_attn.k_proj.qzeros", "model.layers.14.self_attn.v_proj.qzeros", "model.layers.14.self_attn.o_proj.qzeros", "model.layers.14.mlp.gate_proj.qzeros", "model.layers.14.mlp.down_proj.qzeros", "model.layers.14.mlp.up_proj.qzeros", "model.layers.15.self_attn.q_proj.qzeros", "model.layers.15.self_attn.k_proj.qzeros", "model.layers.15.self_attn.v_proj.qzeros", "model.layers.15.self_attn.o_proj.qzeros", "model.layers.15.mlp.gate_proj.qzeros", "model.layers.15.mlp.down_proj.qzeros", "model.layers.15.mlp.up_proj.qzeros", "model.layers.16.self_attn.q_proj.qzeros", "model.layers.16.self_attn.k_proj.qzeros", "model.layers.16.self_attn.v_proj.qzeros", "model.layers.16.self_attn.o_proj.qzeros", "model.layers.16.mlp.gate_proj.qzeros", "model.layers.16.mlp.down_proj.qzeros", "model.layers.16.mlp.up_proj.qzeros", "model.layers.17.self_attn.q_proj.qzeros", "model.layers.17.self_attn.k_proj.qzeros", "model.layers.17.self_attn.v_proj.qzeros", "model.layers.17.self_attn.o_proj.qzeros", "model.layers.17.mlp.gate_proj.qzeros", "model.layers.17.mlp.down_proj.qzeros", "model.layers.17.mlp.up_proj.qzeros", "model.layers.18.self_attn.q_proj.qzeros", "model.layers.18.self_attn.k_proj.qzeros", "model.layers.18.self_attn.v_proj.qzeros", "model.layers.18.self_attn.o_proj.qzeros", "model.layers.18.mlp.gate_proj.qzeros", "model.layers.18.mlp.down_proj.qzeros", "model.layers.18.mlp.up_proj.qzeros", "model.layers.19.self_attn.q_proj.qzeros", "model.layers.19.self_attn.k_proj.qzeros", "model.layers.19.self_attn.v_proj.qzeros", "model.layers.19.self_attn.o_proj.qzeros", "model.layers.19.mlp.gate_proj.qzeros", "model.layers.19.mlp.down_proj.qzeros", "model.layers.19.mlp.up_proj.qzeros", "model.layers.20.self_attn.q_proj.qzeros", "model.layers.20.self_attn.k_proj.qzeros", "model.layers.20.self_attn.v_proj.qzeros", "model.layers.20.self_attn.o_proj.qzeros", "model.layers.20.mlp.gate_proj.qzeros", "model.layers.20.mlp.down_proj.qzeros", "model.layers.20.mlp.up_proj.qzeros", "model.layers.21.self_attn.q_proj.qzeros", "model.layers.21.self_attn.k_proj.qzeros", "model.layers.21.self_attn.v_proj.qzeros", "model.layers.21.self_attn.o_proj.qzeros", "model.layers.21.mlp.gate_proj.qzeros", "model.layers.21.mlp.down_proj.qzeros", "model.layers.21.mlp.up_proj.qzeros", "model.layers.22.self_attn.q_proj.qzeros", "model.layers.22.self_attn.k_proj.qzeros", "model.layers.22.self_attn.v_proj.qzeros", "model.layers.22.self_attn.o_proj.qzeros", "model.layers.22.mlp.gate_proj.qzeros", "model.layers.22.mlp.down_proj.qzeros", "model.layers.22.mlp.up_proj.qzeros", "model.layers.23.self_attn.q_proj.qzeros", "model.layers.23.self_attn.k_proj.qzeros", "model.layers.23.self_attn.v_proj.qzeros", "model.layers.23.self_attn.o_proj.qzeros", "model.layers.23.mlp.gate_proj.qzeros", "model.layers.23.mlp.down_proj.qzeros", "model.layers.23.mlp.up_proj.qzeros", "model.layers.24.self_attn.q_proj.qzeros", "model.layers.24.self_attn.k_proj.qzeros", "model.layers.24.self_attn.v_proj.qzeros", "model.layers.24.self_attn.o_proj.qzeros", "model.layers.24.mlp.gate_proj.qzeros", "model.layers.24.mlp.down_proj.qzeros", "model.layers.24.mlp.up_proj.qzeros", "model.layers.25.self_attn.q_proj.qzeros", "model.layers.25.self_attn.k_proj.qzeros", "model.layers.25.self_attn.v_proj.qzeros", "model.layers.25.self_attn.o_proj.qzeros", "model.layers.25.mlp.gate_proj.qzeros", "model.layers.25.mlp.down_proj.qzeros", "model.layers.25.mlp.up_proj.qzeros", "model.layers.26.self_attn.q_proj.qzeros", "model.layers.26.self_attn.k_proj.qzeros", "model.layers.26.self_attn.v_proj.qzeros", "model.layers.26.self_attn.o_proj.qzeros", "model.layers.26.mlp.gate_proj.qzeros", "model.layers.26.mlp.down_proj.qzeros", "model.layers.26.mlp.up_proj.qzeros", "model.layers.27.self_attn.q_proj.qzeros", "model.layers.27.self_attn.k_proj.qzeros", "model.layers.27.self_attn.v_proj.qzeros", "model.layers.27.self_attn.o_proj.qzeros", "model.layers.27.mlp.gate_proj.qzeros", "model.layers.27.mlp.down_proj.qzeros", "model.layers.27.mlp.up_proj.qzeros", "model.layers.28.self_attn.q_proj.qzeros", "model.layers.28.self_attn.k_proj.qzeros", "model.layers.28.self_attn.v_proj.qzeros", "model.layers.28.self_attn.o_proj.qzeros", "model.layers.28.mlp.gate_proj.qzeros", "model.layers.28.mlp.down_proj.qzeros", "model.layers.28.mlp.up_proj.qzeros", "model.layers.29.self_attn.q_proj.qzeros", "model.layers.29.self_attn.k_proj.qzeros", "model.layers.29.self_attn.v_proj.qzeros", "model.layers.29.self_attn.o_proj.qzeros", "model.layers.29.mlp.gate_proj.qzeros", "model.layers.29.mlp.down_proj.qzeros", "model.layers.29.mlp.up_proj.qzeros", "model.layers.30.self_attn.q_proj.qzeros", "model.layers.30.self_attn.k_proj.qzeros", "model.layers.30.self_attn.v_proj.qzeros", "model.layers.30.self_attn.o_proj.qzeros", "model.layers.30.mlp.gate_proj.qzeros", "model.layers.30.mlp.down_proj.qzeros", "model.layers.30.mlp.up_proj.qzeros", "model.layers.31.self_attn.q_proj.qzeros", "model.layers.31.self_attn.k_proj.qzeros", "model.layers.31.self_attn.v_proj.qzeros", "model.layers.31.self_attn.o_proj.qzeros", "model.layers.31.mlp.gate_proj.qzeros", "model.layers.31.mlp.down_proj.qzeros", "model.layers.31.mlp.up_proj.qzeros".
size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([32, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([86, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([11008, 1]).
At first everybody thought it was broken but ozcur was right it's not (sorry about that).
Anyway, there's two methods:
Use GPTQ-for-LLaMa's inference script like in the exemple provided on the model card:
python llama_inference.py /path/alpaca-native-4bit --wbits 4 --groupsize 128 --load /path/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 300 --text "your text"Try the gptq-group-size branch of text-generation-webui (wip, supports groupsize 128):
python server.py --model alpaca-native-4bit --gptq-bits 4 --gptq-model-type llama
https://github.com/oobabooga/text-generation-webui/pull/530
edit: with the new update of text-generation-webui it is:
python server.py --model alpaca-native-4bit --wbits 4 --model_type llama --groupsize 128
Thanks
Not sure what I'm missing here but I keep getting the missing key(s) error.
This is the command I'm running, which is pretty much the same as the README: python llama_inference.py /content/models/ozcur/alpaca-native-4bit --wbits 4 --groupsize 128 --load /content/models/ozcur/alpaca-native-4bit/alpaca7b-4bit.pt --max_length 500 --text "Instruction: What is an alpaca? How is it different from a llama?"
Can anyone help please.
GPTQ-for-llama have seen some changes lately, are you sure you are not using the
default triton git branch instead of the cuda one?
Edit: Check llama's entry on the textgen-webui wiki for more infos.
Ya I'm on the latest triton branch.
I did check the textgen wiki. But no luck still. I have this issue with other 4-bit models as well.
That's probably the issue, switch to the cuda branch.
Cuda branch also fails. Me (& others) seem to be having this problem with other models as well: https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/discussions/4
Should I be on a specific commit of the Cuda branch. This stuff is developing so fast I can barely keep up!