|
--- |
|
license: other |
|
--- |
|
# superhot-30b-8k-4bit-128g-safetensors |
|
|
|
**Note: Maximum sequence length (max_seq_len) and compression factor (compress_pos_emb) need to be set to 8192 (or lower) and 4.** |
|
|
|
Merged base LLaMA and LoRA with this: |
|
https://github.com/tloen/alpaca-lora |
|
|
|
Base LLaMA 30B: |
|
https://huggingface.co/huggyllama/llama-30b |
|
|
|
SuperHOT 30B 8k no-rlhf-test LoRA: |
|
https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test |
|
|
|
``` sh |
|
BASE_MODEL=huggyllama_llama-30b LORA=kaiokendev_superhot-30b-8k-no-rlhf-test python export_hf_checkpoint.py |
|
``` |
|
|
|
Quantized with AutoGPTQ: |
|
https://github.com/PanQiWei/AutoGPTQ |
|
|
|
``` sh |
|
python quant_with_alpaca.py --pretrained_model_dir superhot-30b-8k-safetensors --quantized_model_dir superhot-30b-8k-4bit-128g-safetensors --bits 4 --group_size 128 --desc_act --num_samples 256 --save_and_reload |
|
``` |
|
|
|
Perplexity: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \ |
|
-d /workspace/models/superhot-30b-8k-4bit-128g-safetensors \ |
|
-ppl \ |
|
-ppl_ds datasets/wikitext2.txt \ |
|
-l 8192 \ |
|
-cpe 4 \ |
|
-ppl_cn 40 \ |
|
-ppl_cs 8192 \ |
|
-ppl_ct 8192 |
|
-- Perplexity: |
|
-- - Dataset: datasets/wikitext2.txt |
|
-- - Chunks: 40 |
|
-- - Chunk size: 8192 -> 8192 |
|
-- - Chunk overlap: 0 |
|
-- - Min. chunk size: 50 |
|
-- - Key: text |
|
-- Tokenizer: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/tokenizer.model |
|
-- Model config: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/config.json |
|
-- Model: /workspace/models/superhot-30b-8k-4bit-128g-safetensors/4bit-128g.safetensors |
|
-- Sequence length: 8192 |
|
-- RoPE compression factor: 4.0 |
|
-- Tuning: |
|
-- --matmul_recons_thd: 8 |
|
-- --fused_mlp_thd: 2 |
|
-- --sdp_thd: 8 |
|
-- Options: ['perplexity'] |
|
** Time, Load model: 4.31 seconds |
|
** Time, Load tokenizer: 0.01 seconds |
|
-- Groupsize (inferred): 128 |
|
-- Act-order (inferred): yes |
|
** VRAM, Model: [cuda:0] 17,043.70 MB |
|
-- Loading dataset... |
|
-- Testing 40 chunks.... |
|
** Perplexity: 4.6612 |
|
``` |