bartowski
/

Mistral-22B-v0.2-exl2

Text Generation

Model card Files Files and versions Community

bartowski commited on Apr 15

Commit

03a288a

•

1 Parent(s): 0f5ee7f

Update VRAM estimates

Files changed (1) hide show

README.md +9 -14

README.md CHANGED Viewed

@@ -12,10 +12,6 @@ Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.18">turb
 Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
-Conversion was done using the default calibration dataset.
-Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
 Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
 ## Prompt Format
@@ -26,17 +22,16 @@ Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
 ### Assistant:
 ```
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/8_0">8.0 bits per weight</a>
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/6_5">6.5 bits per weight</a>
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/5_0">5.0 bits per weight</a>
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/4_25">4.25 bits per weight</a>
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_5">3.5 bits per weight</a>
-<a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_0">3.0 bits per weight</a>
 ## Download instructions

 Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
 Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
 ## Prompt Format
 ### Assistant:
 ```
+## Available sizes
+| Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
+| ------ | ---- | ------------ | ---- | ---- | ---- | ----------- |
+| [8_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/8_0)   | 8.0  | 8.0 | 23.5 GB | 26.0 GB | 29.5 GB | Near unquantized performance, max quality ExLlamaV2 can create.     |
+| [6_5](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/6_5)   | 6.5  | 8.0 | 19.4 GB | 21.9 GB | 25.4 GB | Near unquantized performance at vastly reduced size, **recommended**.       |
+| [5_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/5_0)   | 5.0  | 6.0 | 15.5 GB | 18.0 GB | 21.5 GB | Smaller size, lower quality, still very high performance, **recommended**.       |
+| [4_25](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/4_25) | 4.25 | 6.0 | 13.3 GB | 15.8 GB | 19.3 GB | GPTQ equivalent bits per weight, slightly higher quality.                   |
+| [3_5](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_5)   | 3.5  | 6.0 | 11.6 GB | 14.1 GB | 17.6 GB | Lower quality, only use if you have to.                                     |
+| [3_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_0)   | 3.0  | 6.0 | 9.8 GB | 12.3 GB | 15.8 GB | Very low quality. Usable on 12GB with low context or 16gb with 32k. |
 ## Download instructions