Tsukasa burns down a handicapped hospital.
Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.
i think you mean qwen 2 57b , am really interested in that MOE model and what it could do
How do you test your models internally? I'm novice and trying to build RP models :) @alpindale
Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.
Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.
Will they be available over on the pygsite for testing?
Really curious what Qwen2-57B-A14B can do when finetuned. It's the exact same size as Mixtral 8x7B, right? 8 7B experts with 2 active ones.
Qwen 57B would be bigger by a bit, 56.3B vs 47B non active parameters (I'm guessing they use a different MOE type? I can't find any papers stating what kind is used). Speed wise it would almost identical as Mixtral has 13B active parameters vs 14B with qwen.
Qwen2-57B
Mixtral-8x7B
^ NyxKrage/LLM-Model-VRAM-Calculator which now support IQ quants :3