Panchovix
/

WizardLM-33B-V1.0-Uncensored-SuperHOT-8k-4bit-32g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Panchovix commited on Jun 26, 2023

Commit

7e08fc5

•

1 Parent(s): 35200a1

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -5,4 +5,10 @@ license: other
 It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
-I HIGHLY suggest to use exllama, to evade some VRAM issues.

 It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
+I HIGHLY suggest to use exllama, to evade some VRAM issues.
+Use (max_seq_len = context):
+If max_seq_len  = 4096, compress_pos_emb = 2
+If max_seq_len  = 8192, compress_pos_emb = 4