Question about 1-bit quant

#2
by ThomasBaruzier - opened

Hello,

You are claiming your 1 bit quant is "custom".
Could you please elaborate about how it was made, and if it is higher quality than a traditional IQ1_S or IQ1_M quant?

Thanks.

Owner

only ~92% of the weights are 1bit,
so had to rewrite llama.cpp to do that custom quant,
also have not uploaded them yet

Thank you for the answer
If you plot the model size vs PPL for the two closest quants, would this custom quant yield a lower, equal, or higher perplexity? If there is a real benefit, it might be worth sharing your findings in the llama.cpp repo?

Sign up or log in to comment