3v324v23 commited on
Commit
7cc788c
1 Parent(s): 2d09b37

Update README

Browse files
Files changed (1) hide show
  1. README.md +48 -4
README.md CHANGED
@@ -1,15 +1,59 @@
1
  # superhot-30b-8k-4bit-128g-safetensors
2
 
3
- Merged base LLaMA and LoRA with this: https://github.com/tloen/alpaca-lora
4
- Base LLaMA 30B: https://huggingface.co/huggyllama/llama-30b
5
- SuperCOT 30B 8k LoRA: https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
 
 
 
 
 
6
 
7
  ``` sh
8
  BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
9
  ```
10
 
11
- Quantized with AutoGPTQ: https://github.com/PanQiWei/AutoGPTQ
 
12
 
13
  ``` sh
14
  python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
15
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # superhot-30b-8k-4bit-128g-safetensors
2
 
3
+ Merged base LLaMA and LoRA with this:
4
+ https://github.com/tloen/alpaca-lora
5
+
6
+ Base LLaMA 30B:
7
+ https://huggingface.co/huggyllama/llama-30b
8
+
9
+ SuperCOT 30B 8k LoRA:
10
+ https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
11
 
12
  ``` sh
13
  BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
14
  ```
15
 
16
+ Quantized with AutoGPTQ:
17
+ https://github.com/PanQiWei/AutoGPTQ
18
 
19
  ``` sh
20
  python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
21
  ```
22
+
23
+ Perplexity:
24
+ ```
25
+ CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \
26
+ -d /workspace/models/superhot-30b-8k-4bit-128g-safetensors \
27
+ -ppl \
28
+ -ppl_ds datasets/wikitext2.txt \
29
+ -l 8192 \
30
+ -cpe 4 \
31
+ -ppl_cn 40 \
32
+ -ppl_cs 8192 \
33
+ -ppl_ct 8192
34
+ -- Perplexity:
35
+ -- - Dataset: datasets/wikitext2.txt
36
+ -- - Chunks: 40
37
+ -- - Chunk size: 8192 -> 8192
38
+ -- - Chunk overlap: 0
39
+ -- - Min. chunk size: 50
40
+ -- - Key: text
41
+ -- Tokenizer: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/tokenizer.model
42
+ -- Model config: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/config.json
43
+ -- Model: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/4bit-128g.safetensors
44
+ -- Sequence length: 8192
45
+ -- RoPE compression factor: 4.0
46
+ -- Tuning:
47
+ -- --matmul_recons_thd: 8
48
+ -- --fused_mlp_thd: 2
49
+ -- --sdp_thd: 8
50
+ -- Options: ['perplexity']
51
+ ** Time, Load model: 4.31 seconds
52
+ ** Time, Load tokenizer: 0.01 seconds
53
+ -- Groupsize (inferred): 128
54
+ -- Act-order (inferred): yes
55
+ ** VRAM, Model: [cuda:0] 17,043.70 MB
56
+ -- Loading dataset...
57
+ -- Testing 40 chunks....
58
+ ** Perplexity: 4.6612
59
+ ```