A 8bit version of Model

#12
by varun500 - opened
README.md CHANGED
@@ -1,21 +1,7 @@
1
  ---
2
  license: other
3
  inference: false
4
- pipeline_tag: conversational
5
  ---
6
- <!-- header start -->
7
- <div style="width: 100%;">
8
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
9
- </div>
10
- <div style="display: flex; justify-content: space-between; width: 100%;">
11
- <div style="display: flex; flex-direction: column; align-items: flex-start;">
12
- <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
13
- </div>
14
- <div style="display: flex; flex-direction: column; align-items: flex-end;">
15
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
16
- </div>
17
- </div>
18
- <!-- header end -->
19
  # Vicuna 13B 1.1 GPTQ 4bit 128g
20
 
21
  This is a 4-bit GPTQ version of the [Vicuna 13B 1.1 model](https://huggingface.co/lmsys/vicuna-13b-delta-v1.1).
@@ -35,12 +21,18 @@ I have the following Vicuna 1.1 repositories available:
35
  **13B models:**
36
  * [Unquantized 13B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-13B-1.1-HF)
37
  * [GPTQ quantized 4bit 13B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g)
38
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/vicuna-13B-1.1-GGML)
39
-
40
  **7B models:**
41
  * [Unquantized 7B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-7B-1.1-HF)
42
  * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
43
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/vicuna-7B-1.1-GGML)
 
 
 
 
 
 
 
44
 
45
  ## How to easily download and use this model in text-generation-webui
46
 
@@ -122,30 +114,6 @@ Then link that into `text-generation-webui/repositories` as described above.
122
 
123
  Or just use `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt` as mentioned above, which should work without any upgrades to text-generation-webui.
124
 
125
- <!-- footer start -->
126
- ## Discord
127
-
128
- For further support, and discussions on these models and AI in general, join us at:
129
-
130
- [TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
131
-
132
- ## Thanks, and how to contribute.
133
-
134
- Thanks to the [chirper.ai](https://chirper.ai) team!
135
-
136
- I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
137
-
138
- If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
139
-
140
- Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
141
-
142
- * Patreon: https://patreon.com/TheBlokeAI
143
- * Ko-Fi: https://ko-fi.com/TheBlokeAI
144
-
145
- **Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
146
-
147
- Thank you to all my generous patrons and donaters!
148
- <!-- footer end -->
149
  # Vicuna Model Card
150
 
151
  ## Model details
 
1
  ---
2
  license: other
3
  inference: false
 
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  # Vicuna 13B 1.1 GPTQ 4bit 128g
6
 
7
  This is a 4-bit GPTQ version of the [Vicuna 13B 1.1 model](https://huggingface.co/lmsys/vicuna-13b-delta-v1.1).
 
21
  **13B models:**
22
  * [Unquantized 13B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-13B-1.1-HF)
23
  * [GPTQ quantized 4bit 13B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g)
24
+
 
25
  **7B models:**
26
  * [Unquantized 7B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-7B-1.1-HF)
27
  * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
28
+
29
+ **GGMLs for CPU inference**
30
+
31
+ I removed the GGMLs I originally made for Vicuna 1.1 because they were directly converted GPTQ -> GGML and this seemed to give poor results
32
+
33
+ Instead I recommend you use eachadea's GGMLs:
34
+ * [eachadea's Vicuna 13B 1.1 GGML format for `llama.cpp`](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1)
35
+ * [eachadea's Vicuna 7B 1.1 GGML format for `llama.cpp`](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
36
 
37
  ## How to easily download and use this model in text-generation-webui
38
 
 
114
 
115
  Or just use `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt` as mentioned above, which should work without any upgrades to text-generation-webui.
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  # Vicuna Model Card
118
 
119
  ## Model details
quantize_config.json DELETED
@@ -1,5 +0,0 @@
1
- {
2
- "bits": 4,
3
- "desc_act": false,
4
- "group_size": 128
5
- }
 
 
 
 
 
 
tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -30,4 +30,4 @@
30
  "rstrip": false,
31
  "single_word": false
32
  }
33
- }
 
30
  "rstrip": false,
31
  "single_word": false
32
  }
33
+ }
vicuna-13B-1.1-GPTQ-4bit-128g.latest.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e47a7a68ed4230004e08e83730247625a55cd7493cebadc7be9abf9c3a7275ea
3
+ size 7255159218