nvidia/Llama-3.1-Nemotron-70B-Instruct-HF · [EVALS] Metrics compared to 3.1-70b Instruct by Meta

ID0M

20 days ago

NemesisPrime

20 days ago

Meta's model is better?

okuchaiev

NVIDIA org 19 days ago

•

edited 19 days ago

To be clear - this mode is NOT trained on any new data which has not been previously released before. Instead, we use previously published preference data (HelpSteer2) and public reward model (https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward) to tune this model using REINFORCE for human preferences.

So, this model is not expected to be better on math, coding etc. than the model we've started with - llama-3.1-70b-instruct.
Instead, we expect (as indicated by Arena Hard, AlpacaEval and MT-bench) that humans may prefer responses from this model more.

We are currently validating this hypothesis on lmsys.org chatbot arena and will update model card with Elo scores once we have them.