latimar
/

Phind-Codellama-34B-v2-megacode-exl2

Text Generation

text-generation-inference

Model card Files Files and versions Community

Phind-Codellama-34B-v2-megacode-exl2 / README.md

latimar's picture

Add common model files

6eb4dc1 verified about 1 year ago

|

2.72 kB

	---
	base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
	inference: false
	license: llama2
	model_creator: https://huggingface.co/Phind
	model_name: Phind-Codellama-34B-v2
	model_type: llama
	quantized_by: latimar
	---

	# Phind-CodeLlama-34B-v2 EXL2

	Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
	to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.

	Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
	exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)


	\| BPW (hb=8) \| Human-Eval \| Evol-Ins PPL \| Wiki PPL \| File Size (Gb) \|
	\| ----------- \| ----------- \| ------------ \| ---------- \| -------------- \|
	\| 2.55 \| 0.402439 \| 2.0944 \| 18.9843 \| 10.62 \|
	\| 3.0 \| 0.664634 \| 2.0600 \| 11.2096 \| 12.36 \|
	\| 4.625 \| 0.701219 \| 2.0401 \| 6.7243 \| 18.63 \|
	\| 5.0 \| 0.670731 \| 2.0391 \| 6.6956 \| 20.09 \|

	## Datasets used for calibration and PPL measurement

	* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
	* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
	* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)


	### Conversion

	Conversion arguments:

	```
	convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
	```

	`2.55` quant was converted using even more raws: `-r 400 -mr 64`

	### Perplexity

	Perplexity was measured with [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
	```
	test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
	```

	### Human-Eval

	For the point of reference, Phind says that the original model achieves 73.8 Human-Eval score.

	Unfortunately, FP16/INT8 weights of this model won't fit on my RTX 4090, but FP16 quantized to NF4 fits,
	so I generated samples with [tf.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
	```
	python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
	```

	NF4 variant gives us 0.70731707

	Samples for the Human-Eval scores of EXL2 quants were generated with [exl2.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
	script like this:
	```
	python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 ${BPW}-samples.jsonl
	```