Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +119 -3

README.md CHANGED Viewed

@@ -5,16 +5,119 @@ language:
 - 'no'
 - da
 license: mit
-models:
-  - timpal0l/Mistral-7B-v0.1-flashback-v2-instruct
 tags:
 - pretrained
 - flashback
 - web
 - conversational
 pipeline_tag: text-generation
 widget:
 - text: Jag tycker att det är roligt med
 ---
 # 🐈‍⬛ Mistral-7B-v0.1-flashback-v2
@@ -180,4 +283,17 @@ Tack och lov för apparna som jag kunde leda oss efter. Att åka kollektivt hade
 Tack ska ni ha för tipsen, igen. Tack till Stockholm för att ni tog emot oss med respekt han var så nöjd med resan.
 Hej så länge, vi kommer åter i framtiden! 😁
-```

 - 'no'
 - da
 license: mit
 tags:
 - pretrained
 - flashback
 - web
 - conversational
+models:
+- timpal0l/Mistral-7B-v0.1-flashback-v2-instruct
 pipeline_tag: text-generation
 widget:
 - text: Jag tycker att det är roligt med
+model-index:
+- name: Mistral-7B-v0.1-flashback-v2
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 57.17
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 80.74
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 59.98
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 40.66
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 77.19
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 29.42
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=timpal0l/Mistral-7B-v0.1-flashback-v2
+      name: Open LLM Leaderboard
 ---
 # 🐈‍⬛ Mistral-7B-v0.1-flashback-v2
 Tack ska ni ha för tipsen, igen. Tack till Stockholm för att ni tog emot oss med respekt han var så nöjd med resan.
 Hej så länge, vi kommer åter i framtiden! 😁
+```
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_timpal0l__Mistral-7B-v0.1-flashback-v2)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |57.53|
+|AI2 Reasoning Challenge (25-Shot)|57.17|
+|HellaSwag (10-Shot)              |80.74|
+|MMLU (5-Shot)                    |59.98|
+|TruthfulQA (0-shot)              |40.66|
+|Winogrande (5-shot)              |77.19|
+|GSM8k (5-shot)                   |29.42|