Merge failed

#3
by Autumnlight - opened

So this Merge works slightly better for rp, but its evident that this merge broke some internals.
I noticed that with 64k ctx the attention "lacks" behind. imagine it like the following, message 10 you make a deal, message 11 you talk about something else, suddenly in message 14 it complains about the deal in message 10. and this lag was evident multiple times. In addition it sometimes creates endlessly empty tokens or falls back to wreid patterns.

I'm not surprised that it's falling apart at 64K context. New Dawn v1.0 was a hack job to extend the 8K context of Llama 3 to 32K, so I would expect it to compromise Llama 3.1's long context abilities.
Hopefully future merges with some Llama 3.1 finetunes will help it perform better beyond 32K context.

EDIT: I should also note that I think v1.0 might still have better overall coherence as compared to v1.1. However, v1.1 seems to me to be a little more fun and creative. It always feels like a tradeoff between those two poles. The more a model sticks to the script, the less fun and creative it tends to be, yet if it gets too creative, then it's flying off the rails into absurdity. It feels a lot like CFG to me from text-to-image land. There's a sweet spot somewhere between total adherence and total spontaneity where something good and magical happens, and hitting that mark isn't always easy.

Hmm, my settings may be off. I noticed that logic etc suffered partially when pushing it back to 32k ctx. Me and my buddies decided to stick on 3.0 for now. But thank you for your merge!
(creativity wise, if you give 3.0 a specific prompt to be ultra creative it can really go wild, I think its more of a correct prompting thing. as an example feel free to try the following one for long writing novel:

You're a novel writer. You have two characters, {{char}} and {{user}}. Create a story between {{char}} and {{user}}. Include their thoughts, actions and dialog. Be as creative as possible, respond as detailed as possible. make your responses as long as possible.

And pair this up by renaming yourself to System and adding a second character using a character card the model goes insane.
The multi character chats of this model are also wicked, It's genuinely amazing how fucking good 3.0 handles all of that. We tested many 70B Models, and from what we found New Dawn reigns supreme with an hefty distance. as an example many models can only understand direct relations but cannot notice pattern shifting of user and reevaluate situations. or an different example is if a character gets flung by a beast though the room, your model handles it well (with 2 3 others) while most of them have just characters say ouch once and call it a day. it could even represent wound healing over multiple dialog days.

I'm glad you're enjoying New Dawn v1.0 (the one that was based on Llama 3) so much. I need to go back and spend some more time playing around with it. It's definitely good, and even based on my limited comparisons I'd say it's better at following instructions.

@sophosympatheia after some further testing we observed the issue with your model on other models as well, there may be a chance that llama 3.1 is generally unstable due to suddenly shifting a native 8k model to 32k and with max ctx of 128k (theory atm)

Sign up or log in to comment