File size: 5,133 Bytes
c2ff054
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e6cb38b
 
 
 
 
 
 
 
 
 
a74ca3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2ff054
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6eb4dc1
c2ff054
 
 
 
 
 
9002778
c2ff054
07fdefc
 
 
 
 
 
9002778
c2ff054
 
 
 
9002778
 
 
 
 
 
 
 
 
9c2298d
 
9002778
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar
---

# Phind-CodeLlama-34B-v2 EXL2
 
Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.

Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)

Original model in full weights achieves **73.8** HumanEval score. Here are EXL2 quants scores:

| BPW (hb=8)  | HumanEval | Evol-Ins PPL | Wiki PPL   | File Size (Gb) | 
| ----------- | --------- | ------------ | ---------- | -------------- |
|  2.55       | **40.24** | 2.0944       | 18.9843    |   10.62        |
|  2.8        | **63.41** | 2.0814       | 17.6326    |   11.58        |
|  3.0        | **66.46** | 2.0600       | 11.2096    |   12.36        |
|  4.625      | **70.12** | 2.0401       | 6.7243     |   18.63        |
|  4.8        | **70.73** | 2.0361       | 6.7263     |   19.32        |

## Downloads

If you just do `git clone` you will get weights of all the quants, which is probably not
what you want. You need to download (and put in the same dir) the following common files:

* [config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json)
* [generation_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json)
* [special_tokens_map.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json)
* [tokenizer.model](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model)
* [tokenizer_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json)

And the weights of a particular quant: all safetensors files + `model.safetensors.index.json` file from the quant directory.

Either download these files via the Web UI, or, e.g., with curl:
```
mkdir phind-2.55
cd phind-2.55
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/model.safetensors.index.json
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00001-of-00002.safetensors
curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00002-of-00002.safetensors
```

## Datasets used for calibration and PPL measurement
 
* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)

### Conversion

Conversion arguments:

```
convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
```

`2.55` quant was converted using even more raws: `-r 400 -mr 64`

### Perplexity

Perplexity was measured with [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
```
test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
```

### Human-Eval

#### Evaluation

Samples for the Human-Eval scores of EXL2 quants were generated with [exl2.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
script:
```
python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 -o ${BPW}-samples.jsonl
```

Human-Eval samples of NF4/INT8 quants were generated with [tf.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
```
python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
```

#### Comparison

Phind says that the original model in full weights achieves **73.8** Human-Eval score.
NF4 quant gives me **70.73**

WizardCoder models claimed Human-Eval scores (full weights):

| Model | Score |
| ----- | ----- |
| WizardCoder-Python-34B-V1.0 | **73.2** |
| WizardCoder-Python-13B-V1.0 | **64.0** |

Vanilla Mistral-7B INT8 scores **27.43**

[EXL2 3.2-bpw quant](https://huggingface.co/firelzrd/Phind-CodeLlama-34B-v2-exl2/tree/3_2-bpw) of this model by [firelzrd](https://huggingface.co/firelzrd)
scores **60.97**.