metadata

base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar

Phind-CodeLlama-34B-v2 EXL2

Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.

Converted with the ExllamaV2 convert.py script, exllamav2 commit

Datasets used for calibration and PPL measurement

Conversion

Conversion arguments:

convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}

2.55 quant was converted using even more raws: -r 400 -mr 64

Perplexity

Perplexity was measured with test_inference.py script:

test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}

Human-Eval

Evaluation

Samples for the Human-Eval scores of EXL2 quants were generated with exl2.human-eval.py script:

python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 -o ${BPW}-samples.jsonl

Human-Eval samples of NF4/INT8 quants were generated with tf.human-eval.py script:

python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl

Comparison

Phind says that the original model in full weights achieves 73.8 Human-Eval score. NF4 quant gives me 70.73

WizardCoder models claimed Human-Eval scores (full weights):

Model	Score
WizardCoder-Python-34B-V1.0	73.2
WizardCoder-Python-13B-V1.0	64.0

Vanilla Mistral-7B INT8 scores 27.43

EXL2 3.2-bpw quant of this model by firelzrd scores 60.97.

BPW (hb=8)	HumanEval	Evol-Ins PPL	Wiki PPL	File Size (Gb)
2.55	40.24	2.0944	18.9843	10.62
2.8	63.41	2.0814	17.6326	11.58
3.0	66.46	2.0600	11.2096	12.36
4.625	70.12	2.0401	6.7243	18.63