Merging Tips
Hey, I've got a few tips for merging that might help you with your 4.0 rendition of Stroganoff. Some researched, some I've found on my own.
Set 'normalize=false' so you have more control with the weights. By default, mergekit has 'normalize=true' and it makes it measure the weights of relative to each other to one. So with what you have it set to now, the weights are computed closer to '0.9, 0.3, 0.9' then '0.3, 0.1, 0.3'.
DARE TIES is great, but I've found that TIES anything loses a lot of the characteristics of the added models into the base model. It has its uses and technically produces a better model. However, it's likely why Spellbound Instruct is breaking your merge since it's retaining a lot of the instructs story heavy capabilities. Try setting one of the other models as base with DARE Linear, then place Spellbound at the bottom of the stack. Should produce a more vibrant model.
If you want to retain as much of a model as possible into a merge, passthrough/frankenmerging is the best route, though it takes the longest to find the right recipe. Manually taking a slice out of one model and putting it into another sometimes works better than most merge methods. Just keep in mind layer theory where the first layers primarily control formatting and instruct following, the mids are where most of the detailed information sits, and the ends have the most influence on the style of the output.
I'm by no means an expert, but hopefully some of that helps and I'm open to any questions if you have any.
Is there more in-depth tutorial about this?
Besides the videos covering the technical aspects of each merge method, I don't know of any, unfortunately. A lot of what I know comes from constantly testing and looking at what other people are doing for their merges, their configs, merge theory, etc.
Besides the videos covering the technical aspects of each merge method...
Could you share the video that beginners can understand?
https://www.youtube.com/watch?v=cvOpX75Kz4M&
https://www.youtube.com/watch?v=qbAvOgGmFuE
These two videos break down each method and go into the minor technical details but give you a better understanding overall. Hope they help.
Thanks
Set 'normalize=false' so you have more control with the weights. By default, mergekit has 'normalize=true' and it makes it measure the weights of relative to each other to one. So with what you have it set to now, the weights are computed closer to '0.9, 0.3, 0.9' then '0.3, 0.1, 0.3'.
I didn't even pay attention to this because every example I saw kept it at default, but I can't imagine how normalization is even useful in this scenario. During my tests, I would tweak one model's weight by 0.05 and its behavior would change in seemingly random and unexpected ways, and I suppose normalization was the culprit. I really don't understand why you would want other values to change when you just edit one. Thank you for bringing this to my attention.
No problem. Normalization is just so you don't overload the weights, like if the total weight is over 1.5 or something, since that'd increase your chance of breaking a merge. You don't really see it if the total weight is close to 1, but the moment you start playing with lower values, normalization kicks in hard.