Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,20 @@ I have the following Vicuna 1.1 repositories available:
|
|
28 |
* [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
|
29 |
* [GPTQ quantized 4bit 7B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g-GGML)
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## GIBBERISH OUTPUT
|
32 |
|
33 |
If you get gibberish output, it is because you are using the `safetensors` file without updating GPTQ-for-LLaMA.
|
@@ -43,17 +57,18 @@ Either way, please read the instructions below carefully.
|
|
43 |
Two model files are provided. Ideally use the `safetensors` file. Full details below:
|
44 |
|
45 |
Details of the files provided:
|
46 |
-
* `vicuna-13B-1.1-GPTQ-4bit-128g.
|
47 |
-
* `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
|
48 |
-
* Command to create:
|
49 |
-
* `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
|
50 |
-
* `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
|
51 |
* `pt` format file, created without the `--act-order` flag.
|
52 |
* This file may have slightly lower quality, but is included as it can be used without needing to compile the latest GPTQ-for-LLaMa code.
|
53 |
-
* It
|
54 |
* Command to create:
|
55 |
* `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
|
56 |
|
|
|
|
|
|
|
|
|
|
|
57 |
## How to run in `text-generation-webui`
|
58 |
|
59 |
File `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
|
|
28 |
* [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
|
29 |
* [GPTQ quantized 4bit 7B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g-GGML)
|
30 |
|
31 |
+
## How to easily download and use this model in text-generation-webui
|
32 |
+
|
33 |
+
Load text-generation-webui as you normally do.
|
34 |
+
|
35 |
+
1. Click the **Model tab**.
|
36 |
+
2. Under **Download custom model or LoRA**, enter this repo name: `TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g`.
|
37 |
+
3. Click **Download**.
|
38 |
+
4. Wait until it says it's finished downloading.
|
39 |
+
5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
|
40 |
+
6. Now click the **Refresh** icon next to **Model** in the top left.
|
41 |
+
7. In the **Model drop-down**: choose this model: `vicuna-13B-1.1-GPTQ-4bit-128g`.
|
42 |
+
8. Click **Reload the Model** in the top right.
|
43 |
+
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
44 |
+
|
45 |
## GIBBERISH OUTPUT
|
46 |
|
47 |
If you get gibberish output, it is because you are using the `safetensors` file without updating GPTQ-for-LLaMA.
|
|
|
57 |
Two model files are provided. Ideally use the `safetensors` file. Full details below:
|
58 |
|
59 |
Details of the files provided:
|
60 |
+
* `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt`
|
|
|
|
|
|
|
|
|
61 |
* `pt` format file, created without the `--act-order` flag.
|
62 |
* This file may have slightly lower quality, but is included as it can be used without needing to compile the latest GPTQ-for-LLaMa code.
|
63 |
+
* It will therefore work with one-click-installers on Windows, which include the older GPTQ-for-LLaMa code.
|
64 |
* Command to create:
|
65 |
* `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
|
66 |
|
67 |
+
* `vicuna-13B-1.1-GPTQ-4bit-128g.latest.safetensors`
|
68 |
+
* `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
|
69 |
+
* Command to create:
|
70 |
+
* `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
|
71 |
+
|
72 |
## How to run in `text-generation-webui`
|
73 |
|
74 |
File `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|