Can someone reproduce the accuracy of Llama 3.1 models?

#11
by damoict - opened

Meta claims the 0-shot ARC-C, the accuracy for "Llama 3 8B Instruct; Llama 3.1 8B Instruct; Llama 3 70B Instruct; Llama 3.1 70B Instruct; Llama 3.1 405B Instruct" 5 models is the following

Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9

but I downloaded their huggingface models and have never been able to reproduce this ARC-C with such a high accuracy, usually the ARC-challenge's accuracy is around 60% for those big models. Is there any suggestions to reproduce this results? Especially the 0-shot to achieve 94% accuracy on ARC-Challenge? It is quite unbelievable.

Thanks for all the suggestions.
Screenshot 2024-07-23 at 8.15.42 PM.png

I also only obtained accuracy at around 56 for ARC-C with the LLaMA3-8B-Instruct model. I was using lm-eval. Do you find a solution to reproduce the accuracy?

I also found they said they used 25 shots for LLaMA3 on this page: https://github.com/meta-llama/llama3/blob/main/eval_details.md, but somehow in the screenshot you gave they said they use 0-shot.

Sign up or log in to comment