Inference code

by jmjzz - opened Mar 15

Discussion

jmjzz

Mar 15

Hello, I’m wondering if this new version is finetuned so that we can do inference and evaluation on downstream tasks.

MaziyarPanahi

Mar 17

Both models require further fine-tuning for better performance when you do moe with mergekit (hidden or random). However, the model with hidden gates will do better without further fine-tuning and will require less data/iterations to reach better accuracy.

That’s what I understood from my own moe merges.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment