TheBloke
/

vicuna-7B-v1.3-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jun 18, 2023

Commit

2676a52

•

1 Parent(s): 5b8b77a

Update README.md

Files changed (1) hide show

README.md +13 -3

README.md CHANGED Viewed

@@ -29,6 +29,15 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
@@ -74,8 +83,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
 # Note: check the prompt template is correct for this model.
 prompt = "Tell me about AI"
-prompt_template=f'''### Human: {prompt}
-### Assistant:'''
 print("\n\n*** Generate:")
@@ -106,12 +115,13 @@ print(pipe(prompt_template)[0]['generated_text'])
 **vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
-This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
 It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
 * `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.

 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
+## Prompt template
+```
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+USER: prompt
+ASSISTANT:
+```
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
 # Note: check the prompt template is correct for this model.
 prompt = "Tell me about AI"
+prompt_template=f'''USER: {prompt}
+ASSISTANT:'''
 print("\n\n*** Generate:")
 **vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
+This will work with AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
 It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
 * `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
+  * Works with ExLlama.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.