TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ · Error when loading with oobabooga

Apr 30, 2023

I get this error message when trying to load the model with oobabooga:

Traceback (most recent call last):
File “/home/lukas/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 442, in load_state_dict
return torch.load(checkpoint_file, map_location=“cpu”)
File “/home/lukas/.local/lib/python3.10/site-packages/torch/serialization.py”, line 791, in load
with _open_file_like(f, ‘rb’) as opened_file:
File “/home/lukas/.local/lib/python3.10/site-packages/torch/serialization.py”, line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File “/home/lukas/.local/lib/python3.10/site-packages/torch/serialization.py”, line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: ‘models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/pytorch_model-00001-of-00007.bin’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/lukas/text-generation-webui/server.py”, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/home/lukas/text-generation-webui/modules/models.py”, line 217, in load_model
model = LoaderClass.from_pretrained(checkpoint, **params)
File “/home/lukas/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 471, in from_pretrained
return model_class.from_pretrained(
File “/home/lukas/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 2795, in from_pretrained
) = cls._load_pretrained_model(
File “/home/lukas/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 3109, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File “/home/lukas/.local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 445, in load_state_dict
with open(checkpoint_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ/pytorch_model-00001-of-00007.bin’

What do I have to change there?

TheBloke

Owner Apr 30, 2023

•

edited Apr 30, 2023

This issue occurs when the GPTQ parameters are not set in the ooba UI. Instructions are in the README for how to set these params.

So I assume you're using one of the 1024g models?

Unfortunately there's an issue at the moment - the UI won't let you set a groupsize of 1024. I've submitted some code to oobabooga to fix this (here: https://github.com/oobabooga/text-generation-webui/pull/1660) but it's not been merged yet.

Until this code is merged into text-generation-webui, you have three options:

Use command line arguments to specify the correct GPTQ params. Launch the ooba UI with the following command line arguments: --groupsize 1024 --wbits 4 --model_type llama, eg:

 python server.py --wbits 4 --groupsize 1024 --model_type llama --quant_attn --warmup_autotune --fused_mlp  #plus any other command line args you want

(don't add --quant_attn --warmup_autotune --fused_mlp unless you're using Triton GPTQ-for-LLaMa - but hopefully you are using Triton as you're on Linux)

If you don't want to use different command line arguments for some reason, you could instead use a groupsize 128 model. I have two available in 128g-compat and 128g-latest branches:

These have groupsize = 128 so you will be able to set groupsize = 128 in the UI as detailed in the README.

Note that 128g models use a little more VRAM and are therefore more likely to go Out of Memory or slow down. You can try CPU offloading to try and avoid that.

Or if you don't want to use a 128g mode and you don't want to change the command line arguments, you could manually edit the UI's code to allow specifying groupsize 1024 , as described here:

Note that if you do this you won't be able to git pull new updates without first doing git reset --hard

I'd recommend option 1.

Hopefully ooba will soon merge my PR and then the UI issue will be resolved.

DrSmurf

Apr 30, 2023

Thank you very much for this great answer!
Yes I wanted to use the 1024g model as I have a P40 and fear that the vram usage might be to high with 128g.

yehiaserag

May 2, 2023

Merged 8 hours ago

poisenbery

May 8, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

TheBloke

Owner May 9, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

Firstly please check the sha256sum of the downloaded models and confirm it matches the ones listed in this HF repo. This might be because the model hasn't been downloaded properly.

TheBloke

Owner May 9, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

Firstly please check the sha256sum of the downloaded models and confirm it matches the ones listed in this HF repo. This might be because the model hasn't been downloaded properly.

TheBloke

Owner May 9, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

Firstly please check the sha256sum of the downloaded models and confirm it matches the ones listed in this HF repo. This might be because the model hasn't been downloaded properly.

TheBloke

Owner May 9, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

Firstly please check the sha256sum of the downloaded models and confirm it matches the ones listed in this HF repo. This might be because the model hasn't been downloaded properly.

poisenbery

May 9, 2023

Hey sorry if this is a dumb question but I'm not really smart at this type of stuff.
I keep getting this error:

weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

Firstly please check the sha256sum of the downloaded models and confirm it matches the ones listed in this HF repo. This might be because the model hasn't been downloaded properly.

V:\AI\oobabooga-windows\text-generation-webui\models\TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ>CertUtil -hashfile V:\AI\oobabooga-windows\text-generation-webui\models\TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ\OpenAssistant-30B-epoch7-GPTQ-4bit-1024g.compat.no-act-order.safetensors SHA256
SHA256 hash of V:\AI\oobabooga-windows\text-generation-webui\models\TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ\OpenAssistant-30B-epoch7-GPTQ-4bit-1024g.compat.no-act-order.safetensors:
1fe1a29b637f46da5bc48d9b97deeb87020edbd0c3765fddf1b7a0b207754534
CertUtil: -hashfile command completed successfully.

It's the same. It will generate 1 response before error.
If I use any character file it will error without generating a response.

TheBloke

Owner May 10, 2023

OK please double check you have the right GPTQ parameters for the model you're using. Apparently there's a bug in text-gen-ui at the moment where these params can get reset, so try:

Load model
On models page, Set GPTQ params - bits = 4, model_type = llama, groupsize = appropriate one for the model you're using (128 or 1024)
Click Reload the model
Test

Let me know

poisenbery

May 10, 2023

•

edited May 10, 2023

I do. I checked like...12 times to make sure that I wasn't forgetting anything. I have the settings in the WebUI, I saved the parameters for that model, and I have the command line args.

File "V:\AI\oobabooga-windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 362, in forward
weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336
Output generated in 0.40 seconds (0.00 tokens/s, 0 tokens, context 231, seed 408650408)

I'm using the 1click installer on windows.

Before, it would generate 1 response and error.
Now, it's just erroring.

TheBloke

Owner May 10, 2023

Please try updating text-gen-ui using the update.bat in the 1-click installer. There was a bug fix recently related to GPTQ parameters not saving

So:

Close text-gen-ui
Apply the update
Re-open the UI
Load my model
Set the GPTQ parameters again
Click "Save settings for this model"
Click Reload this model
Test and let me know.

Also, please show me a screenshot of the contents of your models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ folder

poisenbery

May 10, 2023

•

edited May 10, 2023

I've already tried re-installing and updating again just to be safe.

EDIT: I tried re-installing GPTQ and it's not working, so I think that this is something wrong with my install. I'm gonna ask on oobabooga github because everything seems to be ok on the model side of things.

TheBloke

Owner May 10, 2023

Yeah that all looks OK. You can delete pytorch_model.bin.index.json by the way, that file shouldn't be there. But it shouldn't break anything either.

poisenbery

May 10, 2023

Upon further research, it seems like I'm not the only person experiencing difficulties getting GPTQ to work on windows.

I'm probably just going to install linux because it seems like it's a better platform overall for this type of stuff.

Thanks for the help. I know what I need to do now.

TheBloke

Owner May 10, 2023

Yeah it is a lot easier in Linux. You could install WSL2 on top of Windows, and then you don't need to reboot. Your NVidia GPU will be supported. It works quite well I'm told.

https://docs.nvidia.com/cuda/wsl-user-guide/index.html

poisenbery

May 14, 2023

I installed PopOS!, did all of the things, and had the SAME issue.

I found out that the --chat argument was the reason why it was showing that error.

Do you know if there is a way to get this to work with the --chat argument?

I don't really know too much about how any of this stuff works, and was actually surprised that --chat was the reason why it wasn't working.

DrSmurf changed discussion status to closed May 14, 2023

TheBloke

Owner May 15, 2023

@poisenbery I recently released a new model in the main branch that uses group_size = None and should resolve these problems.