13b in the future?
I think the Mistral team said that they would train bigger ones
Why would you want 13b when the 7b model is outperforming 70b models? it clearly shows the performance isn't reliant on the size but instead on how it's structured, like the I4 engine matching v8. What we need is the turbo or supercharger for the models while making the model more sophisticated.
Why would you want 13b when the 7b model is outperforming 70b models? it clearly shows the performance isn't reliant on the size but instead on how it's structured, like the I4 engine matching v8. What we need is the turbo or supercharger for the models while making the model more sophisticated.
7b outperforming 70b doesn't mean that 13b could not do better.
Like with the same turbo or supercharger a v8 will probably outperform an l4
Why would you want 13b when the 7b model is outperforming 70b models? it clearly shows the performance isn't reliant on the size but instead on how it's structured, like the I4 engine matching v8. What we need is the turbo or supercharger for the models while making the model more sophisticated.
7b outperforming 70b doesn't mean that 13b could not do better.
Agreed. a 13b might be like another 100b+ model, and still be of more than reasonable size
Why would you want 13b when the 7b model is outperforming 70b models? it clearly shows the performance isn't reliant on the size but instead on how it's structured, like the I4 engine matching v8. What we need is the turbo or supercharger for the models while making the model more sophisticated.
7b outperforming 70b doesn't mean that 13b could not do better.
Agreed. a 13b might be like another 100b+ model, and still be of more than reasonable size
I understand the idealogy, 'make it bigger to make more power'. That's the same approach Google and openAI took and now it requires supercomputers to run those models. Seeing models with only 70b (XWIN-LM) surpassing gpt4 with 1.7T on alpacaeval and Zepher 7b matching the 1.7T model on some benchmarks should inspire people to focus more on refinement than just resize. The biggest advantage of 7b models is efficiency especially if you need to run multiple models for different agents at the same time on consumer-grade devices. Let's not forget the main goal of open-source AI, making it accessible for everyone.
Adding a 13 isn't going to require a super computer, that is still more than approachable to us 'serfs'.. And no one said anything about removing the smaller option. Just adding more options.
You are right about not needing a supercomputer for 13b but my point is not to shift from the current strategy, it winning against models 100x bigger. For the need of more raw power, you have Mixtral 8x7b. That has 42b parameters making it 3x bigger than the 13b you're asking for.