latimar
/

Phind-Codellama-34B-v2-exl2

Text Generation

text-generation-inference

Model card Files Files and versions Community

Phind-Codellama-34B-v2-exl2 / README.md

latimar's picture

Add metadata

1bf4a80 verified about 1 year ago

|

No virus

1.55 kB

	---
	base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
	inference: false
	license: llama2
	model_creator: https://huggingface.co/Phind
	model_name: Phind-Codellama-34B-v2
	model_type: llama
	quantized_by: latimar
	---

	# Phind-CodeLlama-34B-v2 EXL2

	Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
	to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.

	Each separate quant is in a different branch, like in The Bloke's GPTQ repos.

	```
	export BRANCH=5_0-bpw-h8
	git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2
	```

	There are the following branches:

	```
	5_0-bpw-h8
	4_625-bpw-h6
	4_125-bpw-h6
	2_75-bpw-h6
	2_55-bpw-h6
	```

	* Calibration dataset used for conversion: [wikitext-v2](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)
	* Evaluation dataset used to calculate perplexity: [wikitext-v2](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
	* PPL max seq. length used: 1792 (2048 with 5.0-bpw-h8 causes OOM on RTX 4090 when evaluating ppl, so had to go down a bit)


	\| BPW \| Perplexity \| File Size (Gb) \|
	\| ----------- \| ----------- \| -------------- \|
	\| 2.55-h6 \| 15.0901 \| 10.56 \|
	\| 2.75-h6 \| 13.6153 \| 11.33 \|
	\| 4.125-h6 \| 6.8095 \| 16.65 \|
	\| 4.625-h6 \| 6.7992 \| 18.58 \|
	\| 5.0-h8 \| 6.7785 \| 20.09 \|