Can someone reproduce the accuracy of Llama 3.1 models?
Meta claims the 0-shot ARC-C, the accuracy for "Llama 3 8B Instruct; Llama 3.1 8B Instruct; Llama 3 70B Instruct; Llama 3.1 70B Instruct; Llama 3.1 405B Instruct" 5 models is the following
Reasoning ARC-C 0 acc 82.4 83.4 94.4 94.8 96.9
but I downloaded their huggingface models and have never been able to reproduce this ARC-C with such a high accuracy, usually the ARC-challenge's accuracy is around 60% for those big models. Is there any suggestions to reproduce this results? Especially the 0-shot to achieve 94% accuracy on ARC-Challenge? It is quite unbelievable.
I also only obtained accuracy at around 56 for ARC-C with the LLaMA3-8B-Instruct model. I was using lm-eval. Do you find a solution to reproduce the accuracy?
I also found they said they used 25 shots for LLaMA3 on this page: https://github.com/meta-llama/llama3/blob/main/eval_details.md, but somehow in the screenshot you gave they said they use 0-shot.