Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,13 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/h2ogpt-research-oasst1-llama-65B-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/h2oai/h2ogpt-research-oasst1-llama-65b)
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## How to easily download and use this model in text-generation-webui
|
33 |
|
34 |
Please make sure you're using the latest version of text-generation-webui
|
@@ -72,10 +79,9 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
72 |
use_triton=use_triton,
|
73 |
quantize_config=None)
|
74 |
|
75 |
-
# Note: check the prompt template is correct for this model.
|
76 |
prompt = "Tell me about AI"
|
77 |
-
prompt_template=f'''
|
78 |
-
|
79 |
|
80 |
print("\n\n*** Generate:")
|
81 |
|
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/h2ogpt-research-oasst1-llama-65B-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/h2oai/h2ogpt-research-oasst1-llama-65b)
|
31 |
|
32 |
+
## Prompt template
|
33 |
+
|
34 |
+
```
|
35 |
+
<human>: prompt
|
36 |
+
<bot>:
|
37 |
+
```
|
38 |
+
|
39 |
## How to easily download and use this model in text-generation-webui
|
40 |
|
41 |
Please make sure you're using the latest version of text-generation-webui
|
|
|
79 |
use_triton=use_triton,
|
80 |
quantize_config=None)
|
81 |
|
|
|
82 |
prompt = "Tell me about AI"
|
83 |
+
prompt_template=f'''<human>: {prompt}
|
84 |
+
<bot>:'''
|
85 |
|
86 |
print("\n\n*** Generate:")
|
87 |
|