latimar's picture
Update README
cee8ea0 verified
metadata
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar

Phind-CodeLlama-34B-v2 EXL2

Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.

Each separate quant is in a different branch, like in The Bloke's GPTQ repos.

export BRANCH=5_0-bpw-h8
git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2

There are the following branches:

5_0-bpw-h8
5_0-bpw-h8-evol-ins
4_625-bpw-h6
4_4-bpw-h8
4_125-bpw-h6
3_8-bpw-h6
2_75-bpw-h6
2_55-bpw-h6
  • Calibration dataset used for conversion: wikitext-v2
  • Evaluation dataset used to calculate perplexity: wikitext-v2
  • Calibration dataset used for conversion of 5_0-bpw-h8-evol-ins: wizardLM-evol-instruct_70k
  • Evaluation dataset used to calculate ppl for Evol-Ins: : nikrosh-evol-instruct
  • When converting 4_4-bpw-h8 quant, additional -mr 32 arg was used.

PPL was measured with the test_inference.py exllamav2 script:

python test_inference.py -m /storage/models/LLaMA/EXL2/Phind-Codellama-34B-v2 -ed /storage/datasets/text/evol-instruct/nickrosh-evol-instruct-code-80k.parquet
BPW PPL on Wiki PPL on Evol-Ins File Size (Gb)
2.55-h6 11.0310 2.4542 10.56
2.75-h6 9.7902 2.2888 11.33
3.8-h6 6.7293 2.0724 15.37
4.125-h6 6.6713 2.0617 16.65
4.4-h8 6.6487 2.0509 17.76
4.625-h6 6.6576 2.0459 18.58
5.0-h8 6.6379 2.0419 20.09
5.0-h8-ev 6.7785 2.0445 20.09