Doctor-Shotgun
commited on
Commit
•
a7d176d
1
Parent(s):
b66e2c4
Update README.md
Browse files
README.md
CHANGED
@@ -16,4 +16,10 @@ Branches:
|
|
16 |
- main: 4 decoder bits per weight, 6 head bits
|
17 |
- ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
|
18 |
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
|
19 |
-
- ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
- main: 4 decoder bits per weight, 6 head bits
|
17 |
- ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
|
18 |
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
|
19 |
+
- ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
|
20 |
+
- 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
|
21 |
+
- experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
|
22 |
+
- similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU
|
23 |
+
- maxbpw-h8: ???bpw, 8 head bits
|
24 |
+
- experimental quant, this is the maximum optimized mixed quant size that the current version of exllamav2 produces
|
25 |
+
- somewhat larger than 6.0bpw but not as large as 8bit, recommend 24gb GPU
|