Update README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,10 @@ license: other
|
|
5 |
|
6 |
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
|
7 |
|
8 |
-
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
6 |
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
|
7 |
|
8 |
+
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
9 |
+
|
10 |
+
Use (max_seq_len = context):
|
11 |
+
|
12 |
+
If max_seq_len = 4096, compress_pos_emb = 2
|
13 |
+
|
14 |
+
If max_seq_len = 8192, compress_pos_emb = 4
|