Update README

a74ca3d verified 12 months ago

No virus

5.13 kB

	---
	base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
	inference: false
	license: llama2
	model_creator: https://huggingface.co/Phind
	model_name: Phind-Codellama-34B-v2
	model_type: llama
	quantized_by: latimar
	---

	# Phind-CodeLlama-34B-v2 EXL2

	Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
	to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.

	Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
	exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)

	Original model in full weights achieves 73.8 HumanEval score. Here are EXL2 quants scores:

	\| BPW (hb=8) \| HumanEval \| Evol-Ins PPL \| Wiki PPL \| File Size (Gb) \|
	\| ----------- \| --------- \| ------------ \| ---------- \| -------------- \|
	\| 2.55 \| 40.24 \| 2.0944 \| 18.9843 \| 10.62 \|
	\| 2.8 \| 63.41 \| 2.0814 \| 17.6326 \| 11.58 \|
	\| 3.0 \| 66.46 \| 2.0600 \| 11.2096 \| 12.36 \|
	\| 4.625 \| 70.12 \| 2.0401 \| 6.7243 \| 18.63 \|
	\| 4.8 \| 70.73 \| 2.0361 \| 6.7263 \| 19.32 \|

	## Downloads

	If you just do `git clone` you will get weights of all the quants, which is probably not
	what you want. You need to download (and put in the same dir) the following common files:

	* [config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json)
	* [generation_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json)
	* [special_tokens_map.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json)
	* [tokenizer.model](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model)
	* [tokenizer_config.json](https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json)

	And the weights of a particular quant: all safetensors files + `model.safetensors.index.json` file from the quant directory.

	Either download these files via the Web UI, or, e.g., with curl:
	```
	mkdir phind-2.55
	cd phind-2.55
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/config.json
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/generation_config.json
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/blob/main/special_tokens_map.json
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer.model
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/tokenizer_config.json
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/model.safetensors.index.json
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00001-of-00002.safetensors
	curl -LO https://huggingface.co/latimar/Phind-Codellama-34B-v2-megacode-exl2/raw/main/2.55/output-00002-of-00002.safetensors
	```

	## Datasets used for calibration and PPL measurement

	* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
	* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
	* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)

	### Conversion

	Conversion arguments:

	```
	convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
	```

	`2.55` quant was converted using even more raws: `-r 400 -mr 64`

	### Perplexity

	Perplexity was measured with [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
	```
	test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
	```

	### Human-Eval

	#### Evaluation

	Samples for the Human-Eval scores of EXL2 quants were generated with [exl2.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
	script:
	```
	python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 -o ${BPW}-samples.jsonl
	```

	Human-Eval samples of NF4/INT8 quants were generated with [tf.human-eval.py](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
	```
	python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
	```

	#### Comparison

	Phind says that the original model in full weights achieves 73.8 Human-Eval score.
	NF4 quant gives me 70.73

	WizardCoder models claimed Human-Eval scores (full weights):

	\| Model \| Score \|
	\| ----- \| ----- \|
	\| WizardCoder-Python-34B-V1.0 \| 73.2 \|
	\| WizardCoder-Python-13B-V1.0 \| 64.0 \|

	Vanilla Mistral-7B INT8 scores 27.43

	[EXL2 3.2-bpw quant](https://huggingface.co/firelzrd/Phind-CodeLlama-34B-v2-exl2/tree/3_2-bpw) of this model by [firelzrd](https://huggingface.co/firelzrd)
	scores 60.97.