File size: 1,610 Bytes
1bf4a80 0d4c2a2 9e7e61d e793e71 883600b 3d096d1 0d4c2a2 487b963 67ab3fc 487b963 3124c26 487b963 e793e71 487b963 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar
---
# Phind-CodeLlama-34B-v2 EXL2
Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.
Each separate quant is in a different branch, like in The Bloke's GPTQ repos.
```
export BRANCH=5_0-bpw-h8
git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2
```
There are the following branches:
```
5_0-bpw-h8
4_625-bpw-h6
4_125-bpw-h6
3_8-bpw-h6
2_75-bpw-h6
2_55-bpw-h6
```
* Calibration dataset used for conversion: [wikitext-v2](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)
* Evaluation dataset used to calculate perplexity: [wikitext-v2](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
* PPL max seq. length used: 1792 (2048 with 5.0-bpw-h8 causes OOM on RTX 4090 when evaluating ppl, so had to go down a bit)
| BPW | Perplexity | File Size (Gb) |
| ----------- | ----------- | -------------- |
| 2.55-h6 | 15.0901 | 10.56 |
| 2.75-h6 | 13.6153 | 11.33 |
| 3.8-h6 | 6.8803 | 15.37 |
| 4.125-h6 | 6.8095 | 16.65 |
| 4.625-h6 | 6.7992 | 18.58 |
| 5.0-h8 | 6.7785 | 20.09 |
|