latimar commited on
Commit
c2ff054
1 Parent(s): dde07d1

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
3
+ inference: false
4
+ license: llama2
5
+ model_creator: https://huggingface.co/Phind
6
+ model_name: Phind-Codellama-34B-v2
7
+ model_type: llama
8
+ quantized_by: latimar
9
+ ---
10
+
11
+ # Phind-CodeLlama-34B-v2 EXL2
12
+
13
+ Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
14
+ to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.
15
+
16
+ Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
17
+ exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)
18
+
19
+
20
+ | BPW (hb=8) | Human-Eval | Evol-Ins PPL | Wiki PPL | File Size (Gb) |
21
+ | ----------- | ----------- | ------------ | ---------- | -------------- |
22
+ | 2.55 | 0.402439 | 2.0944 | 18.9843 | 10.62 |
23
+ | 3.0 | 0.664634 | 2.0600 | 11.2096 | 12.36 |
24
+ | 4.625 | 0.701219 | 2.0401 | 6.7243 | 18.63 |
25
+ | 5.0 | 0.670731 | 2.0391 | 6.6956 | 20.09 |
26
+
27
+ ## Datasets used for calibration and PPL measurement
28
+
29
+ * [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
30
+ * [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
31
+ * [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)
32
+
33
+
34
+ ### Conversion
35
+
36
+ Conversion arguments:
37
+
38
+ ```
39
+ convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
40
+ ```
41
+
42
+ `2.55` quant was converted using even more raws: `-r 400 -mr 64`
43
+
44
+ ### Perplexity
45
+
46
+ Perplexity was measured with the [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
47
+ ```
48
+ test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
49
+ ```
50
+
51
+ ### Human-Eval
52
+
53
+ For the point of reference, Phind says that the original model achieves **73.8** Human-Eval score.
54
+
55
+ Unfortunately, FP16/INT8 weights of this model won't fit on my RTX 4090, but FP16 quantized to NF4 fits,
56
+ so I generated samples with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
57
+ ```
58
+ python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
59
+ ```
60
+
61
+ NF4 variant gives us **0.70731707**
62
+
63
+ Samples for the Human-Eval scores of EXL2 quants were generated with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
64
+ script like this:
65
+ ```
66
+ python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 ${BPW}-samples.jsonl
67
+ ```