Update README.md
Browse files
README.md
CHANGED
@@ -26,19 +26,21 @@ This model works best with the following prompt template:
|
|
26 |
|
27 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
28 |
|
29 |
-
Please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
30 |
|
31 |
-
If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors`.
|
32 |
|
33 |
## Provided files
|
34 |
|
35 |
-
Two files are provided. **The
|
|
|
|
|
36 |
|
37 |
Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
|
38 |
|
39 |
-
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `
|
40 |
|
41 |
-
* `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors`
|
42 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
43 |
* Works with text-generation-webui one-click-installers
|
44 |
* Works on Windows
|
@@ -47,7 +49,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
|
|
47 |
```
|
48 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
|
49 |
```
|
50 |
-
* `stable-vicuna-13B-GPTQ-4bit.act-order.safetensors`
|
51 |
* Only works with recent GPTQ-for-LLaMa code
|
52 |
* **Does not** work with text-generation-webui one-click-installers
|
53 |
* Parameters: Groupsize = 128g. act-order.
|
@@ -57,9 +59,23 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
|
|
57 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
|
58 |
```
|
59 |
|
60 |
-
## How to
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
|
62 |
-
File `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
63 |
|
64 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
65 |
|
|
|
26 |
|
27 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
28 |
|
29 |
+
Please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
30 |
|
31 |
+
If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
|
32 |
|
33 |
## Provided files
|
34 |
|
35 |
+
Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
|
36 |
+
|
37 |
+
If you do an automatic download with `text-generation-webui` it will pick the 'compat' file which should work for everyone.
|
38 |
|
39 |
Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
|
40 |
|
41 |
+
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
|
42 |
|
43 |
+
* `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
44 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
45 |
* Works with text-generation-webui one-click-installers
|
46 |
* Works on Windows
|
|
|
49 |
```
|
50 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
|
51 |
```
|
52 |
+
* `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
|
53 |
* Only works with recent GPTQ-for-LLaMa code
|
54 |
* **Does not** work with text-generation-webui one-click-installers
|
55 |
* Parameters: Groupsize = 128g. act-order.
|
|
|
59 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
|
60 |
```
|
61 |
|
62 |
+
## How to easily download and use a model in text-generation-webui
|
63 |
+
|
64 |
+
Load text-generation-webui as you normally do.
|
65 |
+
|
66 |
+
1. Click the **Model tab**.
|
67 |
+
2. Under **Download custom model or LoRA**, enter the repo name to download: `TheBloke/stable-vicuna-13B-GPTQ`.
|
68 |
+
3. Click **Download**.
|
69 |
+
4. Wait until it says it's finished downloading.
|
70 |
+
5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
|
71 |
+
6. Now click the **Refresh** icon next to **Model** in the top left.
|
72 |
+
7. In the **Model drop-down**: choose the model you just downloaded, eg `stable-vicuna-13B-GPTQ`.
|
73 |
+
8. Click **Reload the Model** in the top right.
|
74 |
+
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
75 |
+
|
76 |
+
## Manual instructions for `text-generation-webui`
|
77 |
|
78 |
+
File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
79 |
|
80 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
81 |
|