[EVALS] Metrics compared to 3.1-70b Instruct by Meta
#11
by
ID0M
- opened
Meta's model is better?
To be clear - this mode is NOT trained on any new data which has not been previously released before. Instead, we use previously published preference data (HelpSteer2) and public reward model (https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward) to tune this model using REINFORCE for human preferences.
So, this model is not expected to be better on math, coding etc. than the model we've started with - llama-3.1-70b-instruct.
Instead, we expect (as indicated by Arena Hard, AlpacaEval and MT-bench) that humans may prefer responses from this model more.
We are currently validating this hypothesis on lmsys.org chatbot arena and will update model card with Elo scores once we have them.