File size: 616 Bytes
c47c96e ca5ff77 7e08fc5 353231d da86bfd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
---
license: other
---
[WizardLM-33B-V1.0-Uncensored](https://huggingface.co/ehartford/WizardLM-33B-V1.0-Uncensored) merged with kaiokendev's [33b SuperHOT 8k LoRA](https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test), quantized at 4 bit.
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
I HIGHLY suggest to use exllama, to evade some VRAM issues.
Use compress_pos_emb = 4 for any context up to 8192 context.
If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:
gpu_split: 9,21 |