Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,15 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## How to easily download and use this model in text-generation-webui
|
33 |
|
34 |
Please make sure you're using the latest version of text-generation-webui
|
@@ -74,8 +83,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
74 |
|
75 |
# Note: check the prompt template is correct for this model.
|
76 |
prompt = "Tell me about AI"
|
77 |
-
prompt_template=f'''
|
78 |
-
|
79 |
|
80 |
print("\n\n*** Generate:")
|
81 |
|
@@ -106,12 +115,13 @@ print(pipe(prompt_template)[0]['generated_text'])
|
|
106 |
|
107 |
**vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
|
108 |
|
109 |
-
This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
110 |
|
111 |
It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
|
112 |
|
113 |
* `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
|
114 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
|
|
115 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
116 |
* Works with text-generation-webui, including one-click-installers.
|
117 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
|
31 |
|
32 |
+
## Prompt template
|
33 |
+
|
34 |
+
```
|
35 |
+
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
36 |
+
|
37 |
+
USER: prompt
|
38 |
+
ASSISTANT:
|
39 |
+
```
|
40 |
+
|
41 |
## How to easily download and use this model in text-generation-webui
|
42 |
|
43 |
Please make sure you're using the latest version of text-generation-webui
|
|
|
83 |
|
84 |
# Note: check the prompt template is correct for this model.
|
85 |
prompt = "Tell me about AI"
|
86 |
+
prompt_template=f'''USER: {prompt}
|
87 |
+
ASSISTANT:'''
|
88 |
|
89 |
print("\n\n*** Generate:")
|
90 |
|
|
|
115 |
|
116 |
**vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
|
117 |
|
118 |
+
This will work with AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
119 |
|
120 |
It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
|
121 |
|
122 |
* `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
|
123 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
124 |
+
* Works with ExLlama.
|
125 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
126 |
* Works with text-generation-webui, including one-click-installers.
|
127 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|