|
--- |
|
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2 |
|
inference: false |
|
license: llama2 |
|
model_creator: https://huggingface.co/Phind |
|
model_name: Phind-Codellama-34B-v2 |
|
model_type: llama |
|
quantized_by: latimar |
|
--- |
|
|
|
# Phind-CodeLlama-34B-v2 EXL2 |
|
|
|
Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted |
|
to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format. |
|
|
|
Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script, |
|
exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22) |
|
|
|
Original model in full weights achieves **73.8** HumanEval score. Here are EXL2 quants scores: |
|
|
|
| BPW (hb=8) | HumanEval | Evol-Ins PPL | Wiki PPL | File Size (Gb) | |
|
| ----------- | --------- | ------------ | ---------- | -------------- | |
|
| 2.55 | **40.24** | 2.0944 | 18.9843 | 10.62 | |
|
| 2.8 | **63.41** | 2.0814 | 17.6326 | 11.58 | |
|
| 3.0 | **66.46** | 2.0600 | 11.2096 | 12.36 | |
|
| 4.625 | **70.12** | 2.0401 | 6.7243 | 18.63 | |
|
| 4.8 | **70.73** | 2.0361 | 6.7263 | 19.32 | |
|
|
|
## Downloads |
|
|
|
If you just do `git clone` you will get weights of all the quants, which is probably not |
|
what you want. You need to download (and put in the same dir) the following common files: |
|
|
|
* [config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json) |
|
* [generation_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json) |
|
* [special_tokens_map.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json) |
|
* [tokenizer.model](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model) |
|
* [tokenizer_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json) |
|
|
|
And the weights of a particular quant: all safetensors files + `model.safetensors.index.json` file from the quant directory. |
|
|
|
Either download these files via the Web UI, or, e.g., with curl: |
|
``` |
|
mkdir phind-2.55 |
|
cd phind-2.55 |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/model.safetensors.index.json |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00001-of-00002.safetensors |
|
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00002-of-00002.safetensors |
|
``` |
|
|
|
## Datasets used for calibration and PPL measurement |
|
|
|
* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) |
|
* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet) |
|
* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet) |
|
|
|
### Conversion |
|
|
|
Conversion arguments: |
|
|
|
``` |
|
convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW} |
|
``` |
|
|
|
`2.55` quant was converted using even more raws: `-r 400 -mr 64` |
|
|
|
### Perplexity |
|
|
|
Perplexity was measured with [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script: |
|
``` |
|
test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET} |
|
``` |
|
|
|
### Human-Eval |
|
|
|
#### Evaluation |
|
|
|
Samples for the Human-Eval scores of EXL2 quants were generated with [exl2.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py) |
|
script: |
|
``` |
|
python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 -o ${BPW}-samples.jsonl |
|
``` |
|
|
|
Human-Eval samples of NF4/INT8 quants were generated with [tf.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script: |
|
``` |
|
python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl |
|
``` |
|
|
|
#### Comparison |
|
|
|
Phind says that the original model in full weights achieves **73.8** Human-Eval score. |
|
NF4 quant gives me **70.73** |
|
|
|
WizardCoder models claimed Human-Eval scores (full weights): |
|
|
|
| Model | Score | |
|
| ----- | ----- | |
|
| WizardCoder-Python-34B-V1.0 | **73.2** | |
|
| WizardCoder-Python-13B-V1.0 | **64.0** | |
|
|
|
Vanilla Mistral-7B INT8 scores **27.43** |
|
|
|
[EXL2 3.2-bpw quant](https://huggingface.co/firelzrd/Phind-CodeLlama-34B-v2-exl2/tree/3_2-bpw) of this model by [firelzrd](https://huggingface.co/firelzrd) |
|
scores **60.97**. |
|
|