Is this actually 4.65bpw?
Hey @ParasiticRogue , I just remembered that I still need to test your model, ha ha.
In all honesty, I am quite shocked by this quant too, and have no idea what happened to it... I ran a standard script for doing the quant, like I always do, but was surprised to find the outputs small? Below is what I ran.
python convert.py
-i F:/exllamav2/exllamav2/mnt/models/brucethemoose_Yi-34B-200K-RPMerge/
-o F:/exllamav2/exllamav2/mnt/temp/exl2capy1/
-cf F:/exllamav2/exllamav2/mnt/models/MarinaraSpaghetti_brucethemoose_Yi-34B-200K-RPMerge-4.65bpw-h6-exl2/
-b 4.65
I tested the model, and it worked perfectly fine for me, though. I would need to run some type of perplexity tests since even in metadata it does not say if it's actually 4.65bpw. Likewise, I assume something went wrong, and it's actually a way smaller bpw, but have no clue how to check that, lol. Will add info to the Model Card bout this quant being... cursed. Apologies for the inconvenience! I will re-do it tonight with updated by Bruce script, and see if it turns out bigger this time around.
Hey Marinara! I appreciate your work!
FYI, we're able to run 5.0 bpw on the new 4bit cache in ooba at 20k context on a 24 gig vram.