Adding Evaluation Results

#9
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -71,4 +71,17 @@ Beware of hallucinations: Outputs are often factually wrong or misleading.
71
  Replies might look convincing (at first glance) while containing completely
72
  made up false statements.
73
 
74
- This model is usable only for English conversations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  Replies might look convincing (at first glance) while containing completely
72
  made up false statements.
73
 
74
+ This model is usable only for English conversations.
75
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
76
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OpenAssistant__oasst-sft-1-pythia-12b)
77
+
78
+ | Metric | Value |
79
+ |-----------------------|---------------------------|
80
+ | Avg. | 35.84 |
81
+ | ARC (25-shot) | 46.42 |
82
+ | HellaSwag (10-shot) | 70.0 |
83
+ | MMLU (5-shot) | 26.19 |
84
+ | TruthfulQA (0-shot) | 39.19 |
85
+ | Winogrande (5-shot) | 62.19 |
86
+ | GSM8K (5-shot) | 0.61 |
87
+ | DROP (3-shot) | 6.3 |