Question
Command used for quant? I am debating whether I should upload my quant I did that is similar, just wondering what settings you used. I am trying to also achieve the best output at 8bit
How are you measuring perplexity?
Nothing special, I just used the basic convert script with target bpw of 9python .\convert.py -i H:\Downloads\Gryphe_MythoMax-L2-13b\ -o .\work-test -c .\0000.parquet -b 9 -m ..\..\measurement-hacked.json
I think there's also an option to change the head bpw, which defaults to 6, but I didn't know about it at the time so I guess my model is slightly smaller than MAX. Oh well.
The script spits out a base perplexity when it starts quantizing and another when it's done, that's what I was comparing. Make sure to check those, as the newest version of exllamav2 has a few bugs that can cause it to run into precision issues and spit out a very large ppl, that's a sign that your quant is messed up. It'll still work, but suboptimally.
measurement.json also records the base ppl I think.