Information

GPT4-X-Alpaca 30B 4-bit working with GPTQ versions used in Oobabooga's Text Generation Webui and KoboldAI.

There are 3 quantized versions, one is quantized using GPTQ's --true-sequential and --act-order optimizations, the second is quantized using GPTQ's --true-sequential and --groupsize 128 optimization, and the third one is quantized for GGML using q4_1

This was made using Chansung's GPT4-Alpaca Lora: https://huggingface.co/chansung/gpt4-alpaca-lora-30b

Note: To use with your GPU using GPTQ pick one of the .safetensors along with all of the .jsons and .model files. To use your CPU using GGML(Llamacpp) you only need the single .bin ggml file.

Training Parameters

num_epochs=10
cutoff_len=512
group_by_length
lora_target_modules='[q_proj,k_proj,v_proj,o_proj]'
lora_r=16
micro_batch_size=8

Benchmarks

--true-sequential --act-order

Wikitext2: 4.481280326843262 Ptb-New: 8.539161682128906 C4-New: 6.451964855194092 Note: This version does not use --groupsize 128, therefore evaluations are minimally higher. However, this version allows fitting the whole model at full context using only 24GB VRAM.

--true-sequential --groupsize 128

Wikitext2: 4.285132884979248 Ptb-New: 8.34856128692627 C4-New: 6.292652130126953 Note: This version uses --groupsize 128, resulting in better evaluations. However, it consumes more VRAM.