Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  license: other
3
  tags:
4
  - axolotl
@@ -48,8 +50,109 @@ datasets:
48
  - Open-Orca/SlimOrca
49
  - migtissera/Synthia-v1.3
50
  - TIGER-Lab/ScienceEval
51
- language:
52
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ---
54
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/Z32gXhbukH-L7SB1TQ6Sb.png)
55
 
@@ -230,4 +333,17 @@ Thanks to all open source AI community.
230
 
231
  If you would like to support me:
232
 
233
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: other
5
  tags:
6
  - axolotl
 
50
  - Open-Orca/SlimOrca
51
  - migtissera/Synthia-v1.3
52
  - TIGER-Lab/ScienceEval
53
+ model-index:
54
+ - name: Einstein-v4-phi2
55
+ results:
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: AI2 Reasoning Challenge (25-Shot)
61
+ type: ai2_arc
62
+ config: ARC-Challenge
63
+ split: test
64
+ args:
65
+ num_few_shot: 25
66
+ metrics:
67
+ - type: acc_norm
68
+ value: 59.98
69
+ name: normalized accuracy
70
+ source:
71
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
72
+ name: Open LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: HellaSwag (10-Shot)
78
+ type: hellaswag
79
+ split: validation
80
+ args:
81
+ num_few_shot: 10
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 74.07
85
+ name: normalized accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MMLU (5-Shot)
94
+ type: cais/mmlu
95
+ config: all
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 56.89
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: TruthfulQA (0-shot)
111
+ type: truthful_qa
112
+ config: multiple_choice
113
+ split: validation
114
+ args:
115
+ num_few_shot: 0
116
+ metrics:
117
+ - type: mc2
118
+ value: 45.8
119
+ source:
120
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: Winogrande (5-shot)
127
+ type: winogrande
128
+ config: winogrande_xl
129
+ split: validation
130
+ args:
131
+ num_few_shot: 5
132
+ metrics:
133
+ - type: acc
134
+ value: 73.88
135
+ name: accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: GSM8k (5-shot)
144
+ type: gsm8k
145
+ config: main
146
+ split: test
147
+ args:
148
+ num_few_shot: 5
149
+ metrics:
150
+ - type: acc
151
+ value: 53.98
152
+ name: accuracy
153
+ source:
154
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-phi2
155
+ name: Open LLM Leaderboard
156
  ---
157
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/Z32gXhbukH-L7SB1TQ6Sb.png)
158
 
 
333
 
334
  If you would like to support me:
335
 
336
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
337
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
338
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-phi2)
339
+
340
+ | Metric |Value|
341
+ |---------------------------------|----:|
342
+ |Avg. |60.77|
343
+ |AI2 Reasoning Challenge (25-Shot)|59.98|
344
+ |HellaSwag (10-Shot) |74.07|
345
+ |MMLU (5-Shot) |56.89|
346
+ |TruthfulQA (0-shot) |45.80|
347
+ |Winogrande (5-shot) |73.88|
348
+ |GSM8k (5-shot) |53.98|
349
+