Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ Created by: [upstage](https://huggingface.co/upstage)
|
|
28 |
Quantized with Exllamav2 0.0.11 with default dataset.
|
29 |
## My notes about this model:
|
30 |
I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
|
31 |
-
With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
|
32 |
With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
|
33 |
It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
|
34 |
|
|
|
28 |
Quantized with Exllamav2 0.0.11 with default dataset.
|
29 |
## My notes about this model:
|
30 |
I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
|
31 |
+
With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb for 32k max context and it was able to retrieve details from 16000 token prompt.
|
32 |
With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
|
33 |
It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
|
34 |
|