Update README
Browse files
README.md
CHANGED
@@ -1,15 +1,59 @@
|
|
1 |
# superhot-30b-8k-4bit-128g-safetensors
|
2 |
|
3 |
-
Merged base LLaMA and LoRA with this:
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
``` sh
|
8 |
BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
|
9 |
```
|
10 |
|
11 |
-
Quantized with AutoGPTQ:
|
|
|
12 |
|
13 |
``` sh
|
14 |
python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
|
15 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# superhot-30b-8k-4bit-128g-safetensors
|
2 |
|
3 |
+
Merged base LLaMA and LoRA with this:
|
4 |
+
https://github.com/tloen/alpaca-lora
|
5 |
+
|
6 |
+
Base LLaMA 30B:
|
7 |
+
https://huggingface.co/huggyllama/llama-30b
|
8 |
+
|
9 |
+
SuperCOT 30B 8k LoRA:
|
10 |
+
https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
|
11 |
|
12 |
``` sh
|
13 |
BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
|
14 |
```
|
15 |
|
16 |
+
Quantized with AutoGPTQ:
|
17 |
+
https://github.com/PanQiWei/AutoGPTQ
|
18 |
|
19 |
``` sh
|
20 |
python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
|
21 |
```
|
22 |
+
|
23 |
+
Perplexity:
|
24 |
+
```
|
25 |
+
CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \
|
26 |
+
-d /workspace/models/superhot-30b-8k-4bit-128g-safetensors \
|
27 |
+
-ppl \
|
28 |
+
-ppl_ds datasets/wikitext2.txt \
|
29 |
+
-l 8192 \
|
30 |
+
-cpe 4 \
|
31 |
+
-ppl_cn 40 \
|
32 |
+
-ppl_cs 8192 \
|
33 |
+
-ppl_ct 8192
|
34 |
+
-- Perplexity:
|
35 |
+
-- - Dataset: datasets/wikitext2.txt
|
36 |
+
-- - Chunks: 40
|
37 |
+
-- - Chunk size: 8192 -> 8192
|
38 |
+
-- - Chunk overlap: 0
|
39 |
+
-- - Min. chunk size: 50
|
40 |
+
-- - Key: text
|
41 |
+
-- Tokenizer: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/tokenizer.model
|
42 |
+
-- Model config: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/config.json
|
43 |
+
-- Model: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/4bit-128g.safetensors
|
44 |
+
-- Sequence length: 8192
|
45 |
+
-- RoPE compression factor: 4.0
|
46 |
+
-- Tuning:
|
47 |
+
-- --matmul_recons_thd: 8
|
48 |
+
-- --fused_mlp_thd: 2
|
49 |
+
-- --sdp_thd: 8
|
50 |
+
-- Options: ['perplexity']
|
51 |
+
** Time, Load model: 4.31 seconds
|
52 |
+
** Time, Load tokenizer: 0.01 seconds
|
53 |
+
-- Groupsize (inferred): 128
|
54 |
+
-- Act-order (inferred): yes
|
55 |
+
** VRAM, Model: [cuda:0] 17,043.70 MB
|
56 |
+
-- Loading dataset...
|
57 |
+
-- Testing 40 chunks....
|
58 |
+
** Perplexity: 4.6612
|
59 |
+
```
|