A new HQQ Quantization in <16GB VRAM
#8
by
Chronal
- opened
After discovering @ProphetOfBostrom 's HQQ quant of this model, I had to try making one myself. This version uses Metadata Offloading which wasn't available before, meaning you can load this model into <16GB of VRAM.
I haven't done any in depth testing yet, but the quant seems to generate sensible output so far.