metadata
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar
Phind-CodeLlama-34B-v2 EXL2
Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.
Converted with the ExllamaV2 convert.py script, exllamav2 commit
Datasets used for calibration and PPL measurement
Conversion
Conversion arguments:
convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
2.55
quant was converted using even more raws: -r 400 -mr 64
Perplexity
Perplexity was measured with test_inference.py script:
test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
Human-Eval
Evaluation
Samples for the Human-Eval scores of EXL2 quants were generated with exl2.human-eval.py script:
python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 -o ${BPW}-samples.jsonl
Human-Eval samples of NF4/INT8 quants were generated with tf.human-eval.py script:
python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
Comparison
Phind says that the original model in full weights achieves 73.8 Human-Eval score. NF4 quant gives me 70.73
WizardCoder models claimed Human-Eval scores (full weights):
Model | Score |
---|---|
WizardCoder-Python-34B-V1.0 | 73.2 |
WizardCoder-Python-13B-V1.0 | 64.0 |
Vanilla Mistral-7B INT8 scores 27.43
EXL2 3.2-bpw quant of this model by firelzrd scores 60.97.
BPW (hb=8) | HumanEval | Evol-Ins PPL | Wiki PPL | File Size (Gb) |
---|---|---|---|---|
2.55 | 40.24 | 2.0944 | 18.9843 | 10.62 |
2.8 | 63.41 | 2.0814 | 17.6326 | 11.58 |
3.0 | 66.46 | 2.0600 | 11.2096 | 12.36 |
4.625 | 70.12 | 2.0401 | 6.7243 | 18.63 |