What is the bitwidth used to quantize the LM head? Thanks!
It's the default, 6bpw. If in doubt, check the "quantization_config" key in config.json, specifically "head_bits".
· Sign up or log in to comment