|
--- |
|
license: other |
|
--- |
|
# superhot-13b-8k-4bit-32g-safetensors |
|
|
|
**Note: Maximum sequence length (max_seq_len) and compression factor (compress_pos_emb) need to be set to 8192 (or lower) and 4.** |
|
|
|
Merged base LLaMA and LoRA with this: |
|
https://github.com/tloen/alpaca-lora |
|
|
|
Base LLaMA 13B: |
|
https://huggingface.co/huggyllama/llama-13b |
|
|
|
SuperHOT 13B 8k no-rlhf-test LoRA: |
|
https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test |
|
|
|
``` sh |
|
BASE_MODEL=huggyllama_llama-13b LORA=kaiokendev_superhot-13b-8k-no-rlhf-test python export_hf_checkpoint.py |
|
``` |
|
|
|
Quantized with AutoGPTQ: |
|
https://github.com/PanQiWei/AutoGPTQ |
|
|
|
``` sh |
|
python quant_with_alpaca.py --pretrained_model_dir superhot-13b-8k-safetensors --quantized_model_dir superhot-13b-8k-no-rlhf-test-32g-GPTQ --bits 4 --group_size 32 --desc_act --num_samples 256 --save_and_reload |
|
``` |
|
|
|
Perplexity: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \ |
|
-d /workspace/models/superhot-13b-8k-no-rlhf-test-32g-GPTQ \ |
|
-ppl \ |
|
-ppl_ds datasets/wikitext2.txt \ |
|
-l 8192 \ |
|
-cpe 4 \ |
|
-ppl_cn 40 \ |
|
-ppl_cs 8192 \ |
|
-ppl_ct 8192 |
|
-- Perplexity: |
|
-- - Dataset: datasets/wikitext2.txt |
|
-- - Chunks: 40 |
|
-- - Chunk size: 8192 -> 8192 |
|
-- - Chunk overlap: 0 |
|
-- - Min. chunk size: 50 |
|
-- - Key: text |
|
-- Tokenizer: /workspace/models/superhot-13b-8k-no-rlhf-test-32g-GPTQ/tokenizer.model |
|
-- Model config: /workspace/models/superhot-13b-8k-no-rlhf-test-32g-GPTQ/config.json |
|
-- Model: /workspace/models/superhot-13b-8k-no-rlhf-test-32g-GPTQ/4bit-32g.safetensors |
|
-- Sequence length: 8192 |
|
-- RoPE compression factor: 4.0 |
|
-- Tuning: |
|
-- --matmul_recons_thd: 8 |
|
-- --fused_mlp_thd: 2 |
|
-- --sdp_thd: 8 |
|
-- Options: ['perplexity'] |
|
** Time, Load model: 4.23 seconds |
|
** Time, Load tokenizer: 0.01 seconds |
|
-- Groupsize (inferred): 32 |
|
-- Act-order (inferred): yes |
|
** VRAM, Model: [cuda:0] 7,732.62 MB |
|
-- Loading dataset... |
|
-- Testing 40 chunks.... |
|
** Perplexity: 5.4066 |
|
``` |