yuvraj17 commited on
Commit
998d15b
1 Parent(s): 0e081d5

Added Eval-Scores

Browse files
Files changed (1) hide show
  1. README.md +63 -2
README.md CHANGED
@@ -105,9 +105,70 @@ print(outputs[0]["generated_text"])
105
  ```
106
 
107
  ## 🏆 Evaluation Scores
108
- Coming soon
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
 
110
 
111
  ## Special thanks & Reference
112
- - Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
113
  - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
 
105
  ```
106
 
107
  ## 🏆 Evaluation Scores
108
+
109
+ ### Nous
110
+
111
+ | Model |AGIEval|TruthfulQA|Bigbench|
112
+ |----------------------------------------------------------------------------------------------------------------|------:|---------:|-------:|
113
+ |[Llama3-8B-SuperNova-Spectrum-dare_ties](https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties)| 38.32| 57.15| 43.91|
114
+
115
+ ### AGIEval
116
+ | Task |Version| Metric |Value| |Stderr|
117
+ |------------------------------|------:|--------|----:|---|-----:|
118
+ |agieval_aqua_rat | 0|acc |20.47|± | 2.54|
119
+ | | |acc_norm|18.50|± | 2.44|
120
+ |agieval_logiqa_en | 0|acc |35.94|± | 1.88|
121
+ | | |acc_norm|35.64|± | 1.88|
122
+ |agieval_lsat_ar | 0|acc |21.74|± | 2.73|
123
+ | | |acc_norm|20.00|± | 2.64|
124
+ |agieval_lsat_lr | 0|acc |41.37|± | 2.18|
125
+ | | |acc_norm|40.98|± | 2.18|
126
+ |agieval_lsat_rc | 0|acc |59.11|± | 3.00|
127
+ | | |acc_norm|56.13|± | 3.03|
128
+ |agieval_sat_en | 0|acc |63.59|± | 3.36|
129
+ | | |acc_norm|60.19|± | 3.42|
130
+ |agieval_sat_en_without_passage| 0|acc |40.29|± | 3.43|
131
+ | | |acc_norm|37.38|± | 3.38|
132
+ |agieval_sat_math | 0|acc |38.64|± | 3.29|
133
+ | | |acc_norm|37.73|± | 3.28|
134
+
135
+ Average: 38.32%
136
+
137
+ ### TruthfulQA
138
+ | Task |Version|Metric|Value| |Stderr|
139
+ |-------------|------:|------|----:|---|-----:|
140
+ |truthfulqa_mc| 1|mc1 |38.43|± | 1.7|
141
+ | | |mc2 |57.15|± | 1.5|
142
+
143
+ Average: 57.15%
144
+
145
+ ### Bigbench
146
+ | Task |Version| Metric |Value| |Stderr|
147
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
148
+ |bigbench_causal_judgement | 0|multiple_choice_grade|58.42|± | 3.59|
149
+ |bigbench_date_understanding | 0|multiple_choice_grade|70.73|± | 2.37|
150
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|30.23|± | 2.86|
151
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|47.35|± | 2.64|
152
+ | | |exact_str_match | 0.00|± | 0.00|
153
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|29.00|± | 2.03|
154
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|21.00|± | 1.54|
155
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|51.33|± | 2.89|
156
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|33.20|± | 2.11|
157
+ |bigbench_navigate | 0|multiple_choice_grade|55.40|± | 1.57|
158
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|66.35|± | 1.06|
159
+ |bigbench_ruin_names | 0|multiple_choice_grade|45.76|± | 2.36|
160
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|28.26|± | 1.43|
161
+ |bigbench_snarks | 0|multiple_choice_grade|62.43|± | 3.61|
162
+ |bigbench_sports_understanding | 0|multiple_choice_grade|50.30|± | 1.59|
163
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|48.00|± | 1.58|
164
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.60|± | 1.20|
165
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.66|± | 0.91|
166
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|51.33|± | 2.89|
167
+
168
+ Average: 43.91%
169
+
170
 
171
 
172
  ## Special thanks & Reference
173
+ - Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb), [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54) and [LLM-AutoEva Notebookl](https://github.com/mlabonne/llm-autoeval)
174
  - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)