tmpupload
/

superhot-30b-8k-no-rlhf-test-128g-GPTQ

Text Generation

Inference Endpoints

Model card Files Files and versions Community

3v324v23 commited on Jun 25, 2023

Commit

7cc788c

•

1 Parent(s): 2d09b37

Update README

Files changed (1) hide show

README.md +48 -4

README.md CHANGED Viewed

@@ -1,15 +1,59 @@
 # superhot-30b-8k-4bit-128g-safetensors
-Merged base LLaMA and LoRA with this: https://github.com/tloen/alpaca-lora
-Base LLaMA 30B: https://huggingface.co/huggyllama/llama-30b
-SuperCOT 30B 8k LoRA: https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
 ``` sh
 BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
 ```
-Quantized with AutoGPTQ: https://github.com/PanQiWei/AutoGPTQ
 ``` sh
 python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
 ```

 # superhot-30b-8k-4bit-128g-safetensors
+Merged base LLaMA and LoRA with this:
+https://github.com/tloen/alpaca-lora
+Base LLaMA 30B:
+https://huggingface.co/huggyllama/llama-30b
+SuperCOT 30B 8k LoRA:
+https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
 ``` sh
 BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py
 ```
+Quantized with AutoGPTQ:
+https://github.com/PanQiWei/AutoGPTQ
 ``` sh
 python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload
 ```
+Perplexity:
+```
+CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \
+         -d /workspace/models/superhot-30b-8k-4bit-128g-safetensors \
+         -ppl \
+         -ppl_ds datasets/wikitext2.txt \
+         -l 8192 \
+         -cpe 4 \
+         -ppl_cn 40 \
+         -ppl_cs 8192 \
+         -ppl_ct 8192
+ -- Perplexity:
+ -- - Dataset: datasets/wikitext2.txt
+ -- - Chunks: 40
+ -- - Chunk size: 8192 -> 8192
+ -- - Chunk overlap: 0
+ -- - Min. chunk size: 50
+ -- - Key: text
+ -- Tokenizer: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/tokenizer.model
+ -- Model config: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/config.json
+ -- Model: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/4bit-128g.safetensors
+ -- Sequence length: 8192
+ -- RoPE compression factor: 4.0
+ -- Tuning:
+ -- --matmul_recons_thd: 8
+ -- --fused_mlp_thd: 2
+ -- --sdp_thd: 8
+ -- Options: ['perplexity']
+ ** Time, Load model: 4.31 seconds
+ ** Time, Load tokenizer: 0.01 seconds
+ -- Groupsize (inferred): 128
+ -- Act-order (inferred): yes
+ ** VRAM, Model: [cuda:0] 17,043.70 MB
+ -- Loading dataset...
+ -- Testing 40 chunks....
+ ** Perplexity: 4.6612
+```