cgus
/

SOLAR-10.7B-Instruct-v1.0-128k-exl2

Text Generation

Model card Files Files and versions Community

cgus commited on Jan 23

Commit

c171ab6

•

1 Parent(s): 09a20a0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ Created by: [upstage](https://huggingface.co/upstage)
 Quantized with Exllamav2 0.0.11 with default dataset.
 ## My notes about this model:
 I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
-With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
 With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
 It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.

 Quantized with Exllamav2 0.0.11 with default dataset.
 ## My notes about this model:
 I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
+With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb for 32k max context and it was able to retrieve details from 16000 token prompt.
 With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
 It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.