Can't run the model
I use this command for inference
CUDA_VISIBLE_DEVICES=0 python llama_inference.py elinas/vicuna-13b-4bit --wbits 4 --groupsize 128 --load vicuna-13b-4bit/vicuna-13b-4bit-128g.safetensors --text "this is llama"
and it give me an error in loading state dict
See this in the README and ensure you're on the stable commit. https://huggingface.co/elinas/vicuna-13b-4bit#update-2023-04-03
With the stable commit indicated in the readme, I'm able to load the model fine but inference still fails with the following error:
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: at::Tensor) -> None
Any ideas? Is there a particular commit of transformers you are pinned to?
This does not support llama.cpp, this is for GPTQ via CUDA (or triton).
Right, I understand -- this is running on an A5000 in the cloud. Perhaps it's not using the correct device, I will investigate a bit further.
EDIT -- was able to get it working, needed to rerun setup_cuda
fatal: reference is not a tree: a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773
I am getting reference error when checkingout to the commit specified
I have not seen that error format before so I assumed you were using the Python llama.cpp wrapper.
I am getting reference error when checkingout to the commit specified
Paste the content of the topmost entry from the command git log
fatal: reference is not a tree: a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773
I am getting reference error when checkingout to the commit specified
I think you just need to do git fetch origin a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773
I was able to fix the quant problem now it gives me this error:
Missing key(s) in state_dict
@nealchandra Thanks, I saw that commit is no longer in the base repo (I haven't fetched so I can still checkout that branch). I have updated the instructions to just use the fork.
I was able to fix the quant problem now it gives me this error:
Missing key(s) in state_dict
Make sure you have followed the above steps such as running python setup_cuda.py install
and having all of the requirements.txt
installed.