mo137 commited on
Commit
761df1a
1 Parent(s): ed80501

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
2
  license: cc-by-nc-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ tags:
4
+ - exllamav2
5
+ - exl2
6
+ - Text Generation
7
+ - not-for-all-audiences
8
+ - nsfw
9
+ - Transformers
10
+ - llama
11
+ - text-generation-inference
12
  ---
13
+
14
+ # Amethyst 13B Mistral - EXL2 - 8bpw, hb8
15
+ - Model creator: [Undi](https://huggingface.co/Undi95)
16
+ - Original model: [Amethyst 13B Mistral](https://huggingface.co/Undi95/Amethyst-13B-Mistral)
17
+
18
+ ## Description
19
+ - 8 bits per weight.
20
+ - 8 bits "for the lm_head (output) layer of the model," instead of the typical 6.
21
+ - Works fine with 24 GB VRAM and no flash attention v2 under Windows.
22
+ - For me runs at about 64% of the 4-bit GPTQ speed.
23
+
24
+ I converted the model using the convert.py script from the exllamav2 repo:
25
+ https://github.com/turboderp/exllamav2
26
+ Its documentation:
27
+ https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
28
+
29
+ Measuring the model took 51 minutes, converting it 18 minutes.
30
+
31
+ I used the WikiText-2-v1 dataset for calibration:
32
+ https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet