File size: 1,995 Bytes
a395f74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: other
---
# superhot-7b-8k-4bit--1g-safetensors
**Note: Maximum sequence length (max_seq_len) and compression factor (compress_pos_emb) need to be set to 8192 (or lower) and 4.**
Merged base LLaMA and LoRA with this:
https://github.com/tloen/alpaca-lora
Base LLaMA 7B:
https://huggingface.co/huggyllama/llama-7b
SuperHOT 7B 8k no-rlhf-test LoRA:
https://huggingface.co/kaiokendev/superhot-7b-8k-no-rlhf-test
``` sh
BASE_MODEL=huggyllama_llama-7b LORA=kaiokendev_superhot-7b-8k-no-rlhf-test python export_hf_checkpoint.py
```
Quantized with AutoGPTQ:
https://github.com/PanQiWei/AutoGPTQ
``` sh
python quant_with_alpaca.py --pretrained_model_dir superhot-7b-8k-safetensors --quantized_model_dir superhot-7b-8k-no-rlhf-test-GPTQ --bits 4 --group_size -1 --desc_act --num_samples 256 --save_and_reload
```
Perplexity:
```
CUDA_VISIBLE_DEVICES=0 python test_benchmark_inference.py \
-d /workspace/models/superhot-7b-8k-no-rlhf-test-GPTQ \
-ppl \
-ppl_ds datasets/wikitext2.txt \
-l 8192 \
-cpe 4 \
-ppl_cn 40 \
-ppl_cs 8192 \
-ppl_ct 8192
-- Perplexity:
-- - Dataset: datasets/wikitext2.txt
-- - Chunks: 40
-- - Chunk size: 8192 -> 8192
-- - Chunk overlap: 0
-- - Min. chunk size: 50
-- - Key: text
-- Tokenizer: /workspace/models/superhot-7b-8k-no-rlhf-test-GPTQ/tokenizer.model
-- Model config: /workspace/models/superhot-7b-8k-no-rlhf-test-GPTQ/config.json
-- Model: /workspace/models/superhot-7b-8k-no-rlhf-test-GPTQ/4bit.safetensors
-- Sequence length: 8192
-- RoPE compression factor: 4.0
-- Tuning:
-- --matmul_recons_thd: 8
-- --fused_mlp_thd: 2
-- --sdp_thd: 8
-- Options: ['perplexity']
** Time, Load model: 2.74 seconds
** Time, Load tokenizer: 0.01 seconds
-- Groupsize (inferred): None
-- Act-order (inferred): no
!! Model has empty group index (discarded)
** VRAM, Model: [cuda:0] 3,652.09 MB
-- Loading dataset...
-- Testing 40 chunks....
** Perplexity: 7.0522
```
|