File size: 848 Bytes
339f624
 
6b025a2
 
 
 
 
 
 
 
339f624
6b025a2
b2ce3eb
6b025a2
 
 
 
84b0800
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: apache-2.0
datasets:
- openerotica/Lamia
tags:
- NSFW
- Porn
- Ecommerce
- Roleplay
- Summarization
---

This is a combination of the pruned erotica-analysis data, freedom-rp, and a subest of Airoboros.

The following Categories are what was taken out of the Airoborus datset and added to my own Lamia dataset: 
"roleplay", "unalignment", "editor", "writing", "detailed_writing", "stylized_response", "unalign", "cot", "song"

I'm hoping that this can improve the models narrative/storywriting ability, logic, and intelligence, while reducing any potential inherent ethical "alignment" that may be present in the base mistral model from pretaining on Chat-GPT generated data.

The format is Chatml, and the base model is Yarn Mistral which increases the context size to a true 16k+ rather than rellying on the sliding attention window.