Inference code
#1
by
jmjzz
- opened
Hello, I’m wondering if this new version is finetuned so that we can do inference and evaluation on downstream tasks.
Both models require further fine-tuning for better performance when you do moe with mergekit (hidden or random). However, the model with hidden gates will do better without further fine-tuning and will require less data/iterations to reach better accuracy.
That’s what I understood from my own moe merges.