Any chance for a ggml version to have a better perplexity?

#1
by Nexesenex - opened

I'm using this model (its GPTQ version) in competition with Airoboros lxctx 16384 PI (GGML version).
I enjoy your work, Brandon, and it deserves more.. attention!
Any chance of some GGML versions (QK_4_M or Q5_S/M) to surpass the GPTQ quality (which is more in the QK_3 range usually) and do the most with your model?
Thanks you in any case.

I'd be happy to! My only concern is with how the PNTK embeddings would work with GGML. I'm just not very familiar with it. Any idea how this might work?

I'm not an expert, and the NTK evolving terminology and lack of reference documentation "for end users" confuse me a bit.
But here's a llama.cpp PR thread (and its appendixes at its bottom) which might be of interest for you to get an idea if PNTK is already implemented (whatever the name used for it, and this, even in unmerged PRs).

Sign up or log in to comment