Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision
gptq

8bit version of the model

#8
by varun500 - opened
No description provided.
varun500 changed pull request title from *bit version of the model to 8bit version of the model

A 8bit version of the model would be helpful which can be loaded in 16GB of GPU VRAM

  1. This is a 4bit GPTQ model. I could make an 8bit GPTQ but there's no point because we can already load HF models in 8bit using bitsandbytes
  2. If you want 8bit, please use https://huggingface.co/TheBloke/stable-vicuna-13B-HF and specify load_in_8bit=True like I told you on Github
TheBloke changed pull request status to closed

Sure will do that

Sign up or log in to comment