Doctor-Shotgun commited on
Commit
a7d176d
1 Parent(s): b66e2c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -16,4 +16,10 @@ Branches:
16
  - main: 4 decoder bits per weight, 6 head bits
17
  - ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
18
  - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
19
- - ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
 
 
 
 
 
 
 
16
  - main: 4 decoder bits per weight, 6 head bits
17
  - ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
18
  - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
19
+ - ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
20
+ - 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
21
+ - experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
22
+ - similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU
23
+ - maxbpw-h8: ???bpw, 8 head bits
24
+ - experimental quant, this is the maximum optimized mixed quant size that the current version of exllamav2 produces
25
+ - somewhat larger than 6.0bpw but not as large as 8bit, recommend 24gb GPU