|
--- |
|
license: cc-by-nc-4.0 |
|
tags: |
|
- not-for-all-audiences |
|
- nsfw |
|
--- |
|
## Description |
|
|
|
Exllama 2 quant of [Undi95/PsyMedRP-v1-20B ](https://huggingface.co/Undi95/PsyMedRP-v1-20B) |
|
|
|
3 BPW, Head bit set to 8 |
|
|
|
|
|
## VRAM |
|
My VRAM usage with 20B models are: |
|
| Bits per weight | Context | VRAM | |
|
|--|--|--| |
|
| 6bpw | 4k | 24gb | |
|
| 4bpw | 4k | 18gb | |
|
| 4bpw | 8k | 24gb | |
|
| 3bpw | 4k | 16gb | |
|
| 3bpw | 8k | 21gb | |
|
I have rounded up, these arent exact numbers, this is also on a windows machine. |
|
|
|
## Prompt template |
|
|
|
[Recommended reading](https://huggingface.co/lemonilia/LimaRP-Llama2-13B-v3-EXPERIMENT) |
|
|
|
You can follow these instruction format settings in SillyTavern. Replace `tiny` with |
|
your desired response length: |
|
|
|
![settings](https://files.catbox.moe/6lcz0u.png) |
|
|
|
### Message length control |
|
Inspired by the previously named "Roleplay" preset in SillyTavern, starting from this |
|
version of LimaRP it is possible to append a length modifier to the response instruction |
|
sequence, like this: |
|
|
|
``` |
|
### Input |
|
User: {utterance} |
|
|
|
### Response: (length = medium) |
|
Character: {utterance} |
|
``` |
|
|
|
This has an immediately noticeable effect on bot responses. The available lengths are: |
|
`tiny`, `short`, `medium`, `long`, `huge`, `humongous`, `extreme`, `unlimited`. **The |
|
recommended starting length is `medium`**. Keep in mind that the AI may ramble |
|
or impersonate the user with very long messages. |
|
|
|
The length control effect is reproducible, but the messages will not necessarily follow |
|
lengths very precisely, rather follow certain ranges on average, as seen in this table |
|
with data from tests made with one reply at the beginning of the conversation: |
|
|
|
![lengths](https://files.catbox.moe/dy39bt.png) |
|
|
|
Response length control appears to work well also deep into the conversation. |
|
|