How do I load this model?

by sneedingface - opened Jun 1, 2023

Jun 1, 2023

I have a 4090 but still all the 24GB of vram get filled. I tried "load-in-8bit" but same thing, I also tried "load-in-4bit" but I get the following log:

You are using device_map='auto' on a 4bit loaded version of the model. To automatically compute the appropriate device map, you should upgrade your accelerate library,pip install --upgrade accelerare or install it from source to support fp4 auto device mapcalculation. You may encounter unexpected behavior, or pass your own device map

As a result the model doesn't get loaded, I also ran pip install --upgrade accelerate but nothing changes. I also removed the --auto-devices flag, same story.
Please help!

Monero

Owner Jun 1, 2023

Generally you'd need about 64gb of VRAM for this model unless it was quantized. I'm not familiar with bits and bytes' load in 4 bit but I've heard it's not as good as GPTQ so maybe it doesn't compress the model enough to fit on 24gb. I have a 4 bit version posted if you'd wanna try that. I uploaded this fp16 model for people to create different quantizations off of

sneedingface

Jun 2, 2023

I have a 4 bit version posted if you'd wanna try that

Oh wow that's super! didn't notice that.
Just to be sure, are you referring to Guanaco-SuperCOT-30b-GPTQ-4bit? Is that uncensored as well? Does it differ in any way from WizardLM-30B-Uncensored-Guanaco-SuperCOT-30b overall?
Thanks for your time man!

Monero

Owner Jun 2, 2023

I was actually incorrect and I haven't made a 4bit version . that guanaco-supercot one uses base LLAMA as its base model, the WizardLM one uses WizardLM. I'll try to run the quantization later today

sneedingface

Jun 2, 2023

No problem man, but please once you're done if you could include the parameters to run it in the model description it would be great because I still didn't manage to run not even Guanaco-SuperCOT-30b-GPTQ-4bit and I want to rule out every potential incompatibilities/conflicts from my side (installation-wise) since I can run other Llama-based GPTQ-4bit models without issues.
Thanks in advance.

Monero

Owner Jun 2, 2023

make sure you have the 4bit version of KoboldAI installed https://github.com/0cc4m/koboldai if you're using KoboldAI, the install instructions are in the readme there

The instructions for Oobabooga's TextGen UI are here (I don't use this so I'm not familiar with how it works) https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

Monero

Owner Jun 4, 2023

I haven't forgot, just haven't had the time to do it yet

sneedingface

Jun 5, 2023

I haven't forgot, just haven't had the time to do it yet

That's ok man, no need to be in a hurry

Monero

Owner Jun 9, 2023

so I haven't been able to do it myself, but it looks like someone else has already made a 4bit version:
https://huggingface.co/benjicolby/WizardLM-30B-Guanaco-SuperCOT-GPTQ-4bit

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment