File size: 16,737 Bytes
0b049e7 348a226 0b049e7 348a226 22e0a32 348a226 22e0a32 348a226 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
---
license: other
license_name: yi-license
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
language:
- en
pipeline_tag: text-generation
tags:
- unsloth
- axolotl
- exllamav2
- exl2
- 4bit
library_name: transformers
---
### quantized with the default exl2 dataset with sequence lengths of 8192 and 400 calibration (stage 2, optimisation) lines instead of 2048/100. possibly microwaved, presumably better.
##### resulstant measurement file is present somewhere, though the default line count of 16 (still extended to 8192) was used for measurement (stage 1)
### tokenizer works. tokenizer.model is not required for use with exllama2. no promises about sketchy software by "oobabooga"* :) try tabbyAPI/tavern, or exui if you don't miss CFG
##### consider yourselves lucky it's not a safetensors.zpaq this took all night to upload and YES i did refresh my access tokens after the Whoopsie, sorry!
###### *I'm sure it's fine it's just that I'll die if I ever see conda again.
---
# DreamGen Opus V1
<div style="display: flex; flex-direction: row; align-items: center;">
<img src="/dreamgen/opus-v1-34b/resolve/main/images/logo-1024.png" alt="model logo" style="
border-radius: 12px;
margin-right: 12px;
margin-top: 0px;
margin-bottom: 0px;
max-width: 100px;
height: auto;
"/>
Models for **(steerable) story-writing and role-playing**.
<br/>[All Opus V1 models, including quants](https://huggingface.co/collections/dreamgen/opus-v1-65d092a6f8ab7fc669111b31).
</div>
## Resources
- [**Opus V1 prompting guide**](https://dreamgen.com/docs/models/opus/v1) with many (interactive) examples and prompts that you can copy.
- [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing) for interactive role-play using `opus-v1.2-7b`.
- [Python code](example/prompt/format.py) to format the prompt correctly.
<img src="/dreamgen/opus-v1-34b/resolve/main/images/story_writing.webp" alt="story writing on dreamgen.com" style="
padding: 12px;
border-radius: 12px;
border: 2px solid #f9a8d4;
background: rgb(9, 9, 11);
"/>
## Prompting
<details>
<summary>The models use an extended version of ChatML.</summary>
```
<|im_start|>system
(Story description in the right format here)
(Typically consists of plot description, style description and characters)<|im_end|>
<|im_start|>user
(Your instruction on how the story should continue)<|im_end|>
<|im_start|>text names= Alice
(Continuation of the story from the Alice character)<|im_end|>
<|im_start|>text
(Continuation of the story from no character in particular (pure narration))<|im_end|>
<|im_start|>user
(Your instruction on how the story should continue)<|im_end|>
<|im_start|>text names= Bob
(Continuation of the story from the Bob character)<|im_end|>
```
The Opus V1 extension is the addition of the `text` role, and the addition / modification of role names.
Pay attention to the following:
- The `text` messages can (but don't have to have) `names`, names are used to indicate the "active" character during role-play.
- There can be multiple subsequent message with a `text` role, especially if names are involved.
- There can be multiple names attached to a message.
- The format for names is `names= {{name[0]}}; {{name[1]}}`, beware of the spaces after `names=` and after the `;`. This spacing leads to most natural tokenization for the names.
</details>
While the main goal for the models is great story-writing and role-playing performance, the models are also capable of several writing related tasks as well as general assistance.
Here's how you can prompt the model for the following tasks
- Steerable [Story-writing](https://dreamgen.com/docs/models/opus/v1#task-story-writing) and [Role-playing](https://dreamgen.com/docs/models/opus/v1#task-role-playing):
- Input:
- System prompt: You provide story / role-play description, which consists of:
- Plot description
- Style description
- Characters and their descriptions
- Conversation turns:
- Text / message turn: This represents part of the story or role play
- Instruction: This tells the model what should happen next
- Output: Continuation of the story / role-play.
- [Story plot summarization](https://dreamgen.com/docs/models/opus/v1#task-plot-description)
- Input: A story, or a few chapters of a story.
- Output: A description of the story or chapters.
- [Story character description](https://dreamgen.com/docs/models/opus/v1#task-char-description)
- Input: A story, or a few chapters of a story, set of characters.
- Output: A description of the characters.
- [Story style description](https://dreamgen.com/docs/models/opus/v1#task-style-description)
- Input: A story, or a few chapters of a story.
- Output: A description the style of the story.
- [Story description to chapters](https://dreamgen.com/docs/models/opus/v1#task-story-description-to-chapter-descriptions)
- Input: A brief plot description and the desired number of chapters.
- Output: A description for each chapter.
- And more...
### Sampling params
For story-writing and role-play, I recommend "Min P" based sampling with `min_p` in the range `[0.01, 0.1]` and with `temperature` in the range `[0.5, 1.5]`, depending on your preferences. A good starting point would be `min_p=0.1; temperature=0.8`.
You may also benefit from setting presence, frequency and repetition penalties, especially at lower temperatures.
## Dataset
The fine-tuning dataset consisted of ~100M tokens of steerable story-writing, role-playing, writing-assistant and general-assistant examples. Each example was up to 31000 tokens long.
All story-writing and role-playing examples were based on human-written text.
![token count distribution](images/token_count_cum__token_bucket.png)
## Running the model
The model is should be compatible with any software that supports the base model, but beware of prompting and tokenization.
I recommend using these model versions:
- 7B: [no quant (opus-v1.2-7b)](https://huggingface.co/dreamgen/opus-v1.2-7b)
- 34B: [no quant (opus-v1-34b)](https://huggingface.co/dreamgen/opus-v1-34b) or [awq (opus-v1-34b-awq)](https://huggingface.co/dreamgen/opus-v1-34b-awq)
### Running on DreamGen.com (free)
You can try the model for free on [dreamgen.com](https://dreamgen.com) — note that an account is required.
### Running Locally
- **Make sure your prompt is as close as possible to the Opus V1**
- Regardless of which backend you use, it's important that you format your prompt well and that the tokenization works correctly.
- [Read the prompt guide](https://dreamgen.com/docs/models/opus/v1)
- [Read the prompt formatting code](example/prompt/format.py)
- Make sure `<|im_start|>` and `<|im_end|>` are tokenized correctly
- **vLLM**
- [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing): This is a simple interactive Google Colab to do role-play with the 7B model, it should fit on the T4 GPU.
- [Code](example/prompt/interactive.py): This is simple script for interactive chat for one hard-coded scenario.
- **SillyTavern**
- [Settings](https://huggingface.co/{{REPO_ID}}/tree/main/configs/silly_tavern), v2 kindly provided by @MarinaraSpaghetti
- [Settings screenshot](configs/silly_tavern/settings_screenshot.webp)
- This is just an attempt at approximating the Opus V1 prompt, it won't be perfect
- **LM Studio**
- [Config](configs/lmstudio/preset.json)
- Just like ChatML, just changed "assistant" to "text" role.
- **HuggingFace**
- [Chat template](tokenizer_config.json#L51)
- Just like ChatML, just changed "assistant" to "text" role.
## Known Issues
- **34B tokenization**:
- There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
- This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
- Overall impact should be minor.
- **34B repetition**:
- The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
- **GGUF**:
- The conversion might be messed up and in my tests even `Q_8` of the `opus-v1.2-7b` is much worse than the `fp16` version.
- **Ooba**:
- The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
## Community
Join the DreamGen community on [**Discord**](https://dreamgen.com/discord) to get early access to new models.
## License
- This model is intended for personal use only, other use is not permitted.
---
# DreamGen Opus V1
<div style="display: flex; flex-direction: row; align-items: center;">
<img src="/dreamgen/opus-v1-34b/resolve/main/images/logo-1024.png" alt="model logo" style="
border-radius: 12px;
margin-right: 12px;
margin-top: 0px;
margin-bottom: 0px;
max-width: 100px;
height: auto;
"/>
Models for **(steerable) story-writing and role-playing**.
<br/>[All Opus V1 models, including quants](https://huggingface.co/collections/dreamgen/opus-v1-65d092a6f8ab7fc669111b31).
</div>
## Resources
- [**Opus V1 prompting guide**](https://dreamgen.com/docs/models/opus/v1) with many (interactive) examples and prompts that you can copy.
- [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing) for interactive role-play using `opus-v1.2-7b`.
- [Python code](example/prompt/format.py) to format the prompt correctly.
<img src="/dreamgen/opus-v1-34b/resolve/main/images/story_writing.webp" alt="story writing on dreamgen.com" style="
padding: 12px;
border-radius: 12px;
border: 2px solid #f9a8d4;
background: rgb(9, 9, 11);
"/>
## Prompting
<details>
<summary>The models use an extended version of ChatML.</summary>
```
<|im_start|>system
(Story description in the right format here)
(Typically consists of plot description, style description and characters)<|im_end|>
<|im_start|>user
(Your instruction on how the story should continue)<|im_end|>
<|im_start|>text names= Alice
(Continuation of the story from the Alice character)<|im_end|>
<|im_start|>text
(Continuation of the story from no character in particular (pure narration))<|im_end|>
<|im_start|>user
(Your instruction on how the story should continue)<|im_end|>
<|im_start|>text names= Bob
(Continuation of the story from the Bob character)<|im_end|>
```
The Opus V1 extension is the addition of the `text` role, and the addition / modification of role names.
Pay attention to the following:
- The `text` messages can (but don't have to have) `names`, names are used to indicate the "active" character during role-play.
- There can be multiple subsequent message with a `text` role, especially if names are involved.
- There can be multiple names attached to a message.
- The format for names is `names= {{name[0]}}; {{name[1]}}`, beware of the spaces after `names=` and after the `;`. This spacing leads to most natural tokenization for the names.
</details>
While the main goal for the models is great story-writing and role-playing performance, the models are also capable of several writing related tasks as well as general assistance.
Here's how you can prompt the model for the following tasks
- Steerable [Story-writing](https://dreamgen.com/docs/models/opus/v1#task-story-writing) and [Role-playing](https://dreamgen.com/docs/models/opus/v1#task-role-playing):
- Input:
- System prompt: You provide story / role-play description, which consists of:
- Plot description
- Style description
- Characters and their descriptions
- Conversation turns:
- Text / message turn: This represents part of the story or role play
- Instruction: This tells the model what should happen next
- Output: Continuation of the story / role-play.
- [Story plot summarization](https://dreamgen.com/docs/models/opus/v1#task-plot-description)
- Input: A story, or a few chapters of a story.
- Output: A description of the story or chapters.
- [Story character description](https://dreamgen.com/docs/models/opus/v1#task-char-description)
- Input: A story, or a few chapters of a story, set of characters.
- Output: A description of the characters.
- [Story style description](https://dreamgen.com/docs/models/opus/v1#task-style-description)
- Input: A story, or a few chapters of a story.
- Output: A description the style of the story.
- [Story description to chapters](https://dreamgen.com/docs/models/opus/v1#task-story-description-to-chapter-descriptions)
- Input: A brief plot description and the desired number of chapters.
- Output: A description for each chapter.
- And more...
### Sampling params
For story-writing and role-play, I recommend "Min P" based sampling with `min_p` in the range `[0.01, 0.1]` and with `temperature` in the range `[0.5, 1.5]`, depending on your preferences. A good starting point would be `min_p=0.1; temperature=0.8`.
You may also benefit from setting presence, frequency and repetition penalties, especially at lower temperatures.
## Dataset
The fine-tuning dataset consisted of ~100M tokens of steerable story-writing, role-playing, writing-assistant and general-assistant examples. Each example was up to 31000 tokens long.
All story-writing and role-playing examples were based on human-written text.
![token count distribution](images/token_count_cum__token_bucket.png)
## Running the model
The model is should be compatible with any software that supports the base model, but beware of prompting and tokenization.
I recommend using these model versions:
- 7B: [no quant (opus-v1.2-7b)](https://huggingface.co/dreamgen/opus-v1.2-7b)
- 34B: [no quant (opus-v1-34b)](https://huggingface.co/dreamgen/opus-v1-34b) or [awq (opus-v1-34b-awq)](https://huggingface.co/dreamgen/opus-v1-34b-awq)
### Running on DreamGen.com (free)
You can try the model for free on [dreamgen.com](https://dreamgen.com) — note that an account is required.
### Running Locally
- **Make sure your prompt is as close as possible to the Opus V1**
- Regardless of which backend you use, it's important that you format your prompt well and that the tokenization works correctly.
- [Read the prompt guide](https://dreamgen.com/docs/models/opus/v1)
- [Read the prompt formatting code](example/prompt/format.py)
- Make sure `<|im_start|>` and `<|im_end|>` are tokenized correctly
- **vLLM**
- [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing): This is a simple interactive Google Colab to do role-play with the 7B model, it should fit on the T4 GPU.
- [Code](example/prompt/interactive.py): This is simple script for interactive chat for one hard-coded scenario.
- **SillyTavern**
- [Settings](https://huggingface.co/{{REPO_ID}}/tree/main/configs/silly_tavern), v2 kindly provided by @MarinaraSpaghetti
- [Settings screenshot](configs/silly_tavern/settings_screenshot.webp)
- This is just an attempt at approximating the Opus V1 prompt, it won't be perfect
- **LM Studio**
- [Config](configs/lmstudio/preset.json)
- Just like ChatML, just changed "assistant" to "text" role.
- **HuggingFace**
- [Chat template](tokenizer_config.json#L51)
- Just like ChatML, just changed "assistant" to "text" role.
## Known Issues
- **34B tokenization**:
- There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
- This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
- Overall impact should be minor.
- **34B repetition**:
- The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
- **GGUF**:
- The conversion might be messed up and in my tests even `Q_8` of the `opus-v1.2-7b` is much worse than the `fp16` version.
- **Ooba**:
- The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
## Community
Join the DreamGen community on [**Discord**](https://dreamgen.com/discord) to get early access to new models.
## License
- This model is intended for personal use only, other use is not permitted. |