Performance tweaks

#8
by zxbc2023 - opened

Just to help other folks who happen to stumble on this model, I had some severe performance issues until I tweaked my settings using text-generation-webui (oobabooga, windows). I have a RTX 2070 with 8GB VRAM and was getting as low as 1-2 tokens/s for a while until I changed my load settings. Now I am able to get a consistent 10+ tokens/s and sometimes even 20+t/s, and performance remains stable in long sessions. All I did was to load the model using ExLlama, with max_seq_len 4096 and compress_pos_emb 2. Generation starts very fast and there's hardly any delay, and it's very usable now.

Hope this helps others who are struggling with performance.

zxbc2023 changed discussion status to closed

Sign up or log in to comment