Nice
A successful model, although I personally prefer your 'L3-Rhaenys-2x8B'. With your recommended settings I can easily extend the context to 12k or even 16k (with temp 0.8), both of these models.
I use Dry with the standard settings, and I turn off the standard rep_pen.
It hurts me to say this because I love your 'L3-Nymeria - both versions' but 'Ellaria-9B' crushes it. I have a sentiment for your Nymeria because it was the first RP model that made a huge impression on me (Then I realized that local models running on my pc might be good.). But gemma 2 9b is a different class than llama 3 8b, these are the conclusions I draw when testing these models.
'Ellaria-9B' vs 'L3-Rhaenys-2x8B': models are similarly smart (which makes 'Ellaria-9B' even more impressive). Both models read character sheets very well, and use the information contained therein often. 'L3-Rhaenys-2x8B' mixes up the facts a little less often - changes something in subsequent sentences (e.g.: something was blue and now it's suddenly green). There's not much of it, but it's there (this is my feeling, I didn't measure it). 'Ellaria-9B' creates more diverse stories with multiple generation and has a tendency (in my opinion) to be more perverted. It works like this - with character cards you have to be more careful about the information they contain, because this model is very sensitive to perverted/lewd content and willingly uses this information in building the story (so if the card contains information in several places encouraging the character to perverted behavior - in effect we get a nymphomaniac). I wouldn't take this as a flaw, the model is simply sensitive to these things. I'm writing this because many cards with this model give us characters who are horny and ready for anything. It's enough to limit the behavior hints in the cards (to a more realistic level) and suddenly we have a character who behaves more realistically. Likewise, in the story itself, 'Ellaria-9B' is eager to expand on the kinky threads that the user starts. 'L3-Rhaenys-2x8B' is more gentle in this respect, which makes me prefer this model.
But taking my preferences aside, 'Ellaria-9B' is amazing. She hasn't lost her sharpness over the gemma-2-9b-it, and she can create much better scenes and descriptions of "questionably" correct situations. :-)
I dream of a model with similar capabilities, but with a much greater context (what a shame that llama 3.1 is so... weird). Maybe you could try it with mistral nemo? (Nemo is a bit of a weird model, I see a lot of these models that perform poorly.)
https://huggingface.co/TheDrummer/Rocinante-12B-v1.1
This one is interesting, probably the best I've tried with a large context. Unfortunately, I have noticed that Nemo's models are less good at logic and it is visible in the story and how the situation develops... maybe it's a matter of training. Rocinante is the smartest nemo model I've seen in RP.
Well... I got a bit distracted... Thanks for this model and your other models. And good luck in the future.
To me, Gemma2 felt incredibly refreshing at first after L3, but the more I used the model, the more the similarities started to emerge and in the end the difference didn't seem so massive. I agree, Rhaenys 2x8b is overall slightly better than Ellaria. And yeah, Ellaria has the drummer's data, so it's super perv and not so easy to balance with slerp.
I working on my own ultrafeedback dataset at the moment. Completely different approach to prompts and the data they produce. My first victim will be Nemo 12b once the dataset is ready.