sarath-shekkizhar
commited on
Commit
•
de770dc
1
Parent(s):
b36ec51
Update README.md
Browse files
README.md
CHANGED
@@ -106,7 +106,7 @@ Arena-Hard is an evaluation tool for instruction-tuned LLMs containing 500 chall
|
|
106 |
We now present our results on the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) used for benchmarking Open LLM Leaderboard on Hugging Face.
|
107 |
The task involves evaluation on `6` key benchmarks across reasoning and knowledge with different *few-shot* settings. Read more details about the benchmark at [the leaderboard page](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
108 |
|
109 |
-
|
|
110 |
| --- | --- | --- | --- | --- | --- | --- | --- |
|
111 |
| **Llama3-TenyxChat-70B** | **79.43** | 72.53 | 86.11 | 79.95 | 62.93 | 83.82 | 91.21 |
|
112 |
| *Llama3-70B-Instruct* | 77.88 | 71.42 | 85.69 | 80.06 | 61.81 | 82.87 | 85.44 |
|
|
|
106 |
We now present our results on the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) used for benchmarking Open LLM Leaderboard on Hugging Face.
|
107 |
The task involves evaluation on `6` key benchmarks across reasoning and knowledge with different *few-shot* settings. Read more details about the benchmark at [the leaderboard page](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
108 |
|
109 |
+
| Model-name | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
110 |
| --- | --- | --- | --- | --- | --- | --- | --- |
|
111 |
| **Llama3-TenyxChat-70B** | **79.43** | 72.53 | 86.11 | 79.95 | 62.93 | 83.82 | 91.21 |
|
112 |
| *Llama3-70B-Instruct* | 77.88 | 71.42 | 85.69 | 80.06 | 61.81 | 82.87 | 85.44 |
|