Model throws gibberish instead of actual response.
When I'm trying to use this model with oogabooga web ui I'm getting this kind of responses and I don't know why:
Input:
introduce yourself
Output:
/_mysinside phys chairphys AlcUSTontmymoGP�≠ monuments _ _alu _ _concurrent jsf preced///_mysmysmysmys _ fsmys/_mysmys _mys _ _ _ _ _ _ / phys phys/ phys _ mys _mysmys _leepдра/ Phys/_mysmys/_mys _ _mysmys précéd _mysextend _mys _ _mysmys _ _ _ _ _ _Physmys _mysmys _mysmysmys _ Alcmysmys _ _ Alc _ AlcWF Alc _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Alc Alc _g _ _ Alc _ Alc _ _ _ Alc Alc _ _ _ Alc Alc Alc _ _ Alc Alc _ Alc Alc Alc Alc Alc _ _ _ _ _ _ _ _ Alc _o Alc _mymymy _ _ _ _ _ _ _mymymymymymymymymymymymy _PR _ont _ontmyontmymyont Alc
Please delete the file ending latest.act-order.safetensor
and load file compat.no-act-order.safetensor
instead
Oh sure, happy to. I kept checking the Nomic repo around the time they first released it, but it was never uploaded to HF. But I see it has been now.
I'm starting the process now!
Forgot to come back here and say, it's done!
Having the same problem but can't find the files you specified. Where should I look and/or download?
Hey Bloke, I tried with both 4 bit quantised 7B and 13B .safetensors models.
The final output looks gibberish. Can u pls let me know what am i missing in the below code?
' ' '
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse
quantized_model_dir = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder"
model_basename = "/content/drive/MyDrive/Vicuna/FastChat/models/TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g_actorder/vicuna-7B-1.1-GPTQ-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False
)
model = AutoGPTQForCausalLM.from_quantized( quantized_model_dir,
use_safetensors=True,
model_basename=model_basename,
device="cuda:0",
use_triton=use_triton,
quantize_config=quantize_config
)
prompt = """ """
inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
**inputs,
max_new_tokens=2000,
do_sample=True,
temperature=1.0,
top_p=1.0,
truncation=True
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
' ' '
There was a bug in AutoGPTQ 0.3.0 that causes gibberish when you use a model with both group_size and desc_act.
It can be fixed by updating to AutoGPTQ 0.3.1 or 0.3.2. I recommend to build from source at the moment due to some issues people are having installing from PyPi:
pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip3 install .