A 8bit version of Model

#12

by varun500 - opened May 4, 2023

base: refs/heads/main

←

from: refs/pr/12

Discussion Files changed

+13

-93432

Files changed (5) hide show

README.md +9 -41
quantize_config.json +0 -5
tokenizer.json +0 -0
tokenizer_config.json +1 -1
vicuna-13B-1.1-GPTQ-4bit-128g.latest.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,21 +1,7 @@
 ---
 license: other
 inference: false
-pipeline_tag: conversational
 ---
-<!-- header start -->
-<div style="width: 100%;">
-    <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
-</div>
-<div style="display: flex; justify-content: space-between; width: 100%;">
-    <div style="display: flex; flex-direction: column; align-items: flex-start;">
-        <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
-    </div>
-    <div style="display: flex; flex-direction: column; align-items: flex-end;">
-        <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
-    </div>
-</div>
-<!-- header end -->
 # Vicuna 13B 1.1 GPTQ 4bit 128g
 This is a 4-bit GPTQ version of the [Vicuna 13B 1.1 model](https://huggingface.co/lmsys/vicuna-13b-delta-v1.1).
@@ -35,12 +21,18 @@ I have the following Vicuna 1.1 repositories available:
 **13B models:**
 * [Unquantized 13B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-13B-1.1-HF)
 * [GPTQ quantized 4bit 13B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/vicuna-13B-1.1-GGML)
 **7B models:**
 * [Unquantized 7B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-7B-1.1-HF)
 * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/vicuna-7B-1.1-GGML)
 ## How to easily download and use this model in text-generation-webui
@@ -122,30 +114,6 @@ Then link that into `text-generation-webui/repositories` as described above.
 Or just use `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt` as mentioned above, which should work without any upgrades to text-generation-webui.
-<!-- footer start -->
-## Discord
-For further support, and discussions on these models and AI in general, join us at:
-[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
-## Thanks, and how to contribute.
-Thanks to the [chirper.ai](https://chirper.ai) team!
-I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
-If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
-Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
-* Patreon: https://patreon.com/TheBlokeAI
-* Ko-Fi: https://ko-fi.com/TheBlokeAI
-**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
-Thank you to all my generous patrons and donaters!
-<!-- footer end -->
 # Vicuna Model Card
 ## Model details

 ---
 license: other
 inference: false
 ---
 # Vicuna 13B 1.1 GPTQ 4bit 128g
 This is a 4-bit GPTQ version of the [Vicuna 13B 1.1 model](https://huggingface.co/lmsys/vicuna-13b-delta-v1.1).
 **13B models:**
 * [Unquantized 13B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-13B-1.1-HF)
 * [GPTQ quantized 4bit 13B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g)
 **7B models:**
 * [Unquantized 7B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-7B-1.1-HF)
 * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
+**GGMLs for CPU inference**
+I removed the GGMLs I originally made for Vicuna 1.1 because they were directly converted GPTQ -> GGML and this seemed to give poor results
+Instead I recommend you use eachadea's GGMLs:
+* [eachadea's Vicuna 13B 1.1 GGML format for `llama.cpp`](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1)
+* [eachadea's Vicuna 7B 1.1 GGML format for `llama.cpp`](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
 ## How to easily download and use this model in text-generation-webui
 Or just use `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt` as mentioned above, which should work without any upgrades to text-generation-webui.
 # Vicuna Model Card
 ## Model details

quantize_config.json DELETED Viewed

@@ -1,5 +0,0 @@
-{
-  "bits": 4,
-  "desc_act": false,
-  "group_size": 128
-}

tokenizer.json DELETED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -30,4 +30,4 @@
     "rstrip": false,
     "single_word": false
   }
-}

     "rstrip": false,
     "single_word": false
   }
+}

vicuna-13B-1.1-GPTQ-4bit-128g.latest.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e47a7a68ed4230004e08e83730247625a55cd7493cebadc7be9abf9c3a7275ea
+size 7255159218