A new HQQ Quantization in <16GB VRAM

by Chronal - opened Apr 15

Apr 15

After discovering @ProphetOfBostrom 's HQQ quant of this model, I had to try making one myself. This version uses Metadata Offloading which wasn't available before, meaning you can load this model into <16GB of VRAM.

I haven't done any in depth testing yet, but the quant seems to generate sensible output so far.

https://huggingface.co/Chronal/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss_attn-4bit-moe-2bit-metaoffload-HQQ

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment