yuvraj17
/

Llama3-8B-SuperNova-Spectrum-dare_ties

@@ -105,9 +105,70 @@ print(outputs[0]["generated_text"])
 ```
 ## 🏆 Evaluation Scores
-Coming soon
 ## Special thanks & Reference
-- Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
 - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)

 ```
 ## 🏆 Evaluation Scores
+### Nous
+|                                                     Model                                                      |AGIEval|TruthfulQA|Bigbench|
+|----------------------------------------------------------------------------------------------------------------|------:|---------:|-------:|
+|[Llama3-8B-SuperNova-Spectrum-dare_ties](https://huggingface.co/yuvraj17/Llama3-8B-SuperNova-Spectrum-dare_ties)|  38.32|     57.15|   43.91|
+### AGIEval
+|             Task             |Version| Metric |Value|   |Stderr|
+|------------------------------|------:|--------|----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |20.47|±  |  2.54|
+|                              |       |acc_norm|18.50|±  |  2.44|
+|agieval_logiqa_en             |      0|acc     |35.94|±  |  1.88|
+|                              |       |acc_norm|35.64|±  |  1.88|
+|agieval_lsat_ar               |      0|acc     |21.74|±  |  2.73|
+|                              |       |acc_norm|20.00|±  |  2.64|
+|agieval_lsat_lr               |      0|acc     |41.37|±  |  2.18|
+|                              |       |acc_norm|40.98|±  |  2.18|
+|agieval_lsat_rc               |      0|acc     |59.11|±  |  3.00|
+|                              |       |acc_norm|56.13|±  |  3.03|
+|agieval_sat_en                |      0|acc     |63.59|±  |  3.36|
+|                              |       |acc_norm|60.19|±  |  3.42|
+|agieval_sat_en_without_passage|      0|acc     |40.29|±  |  3.43|
+|                              |       |acc_norm|37.38|±  |  3.38|
+|agieval_sat_math              |      0|acc     |38.64|±  |  3.29|
+|                              |       |acc_norm|37.73|±  |  3.28|
+Average: 38.32%
+### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |38.43|±  |   1.7|
+|             |       |mc2   |57.15|±  |   1.5|
+Average: 57.15%
+### Bigbench
+|                      Task                      |Version|       Metric        |Value|   |Stderr|
+|------------------------------------------------|------:|---------------------|----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|58.42|±  |  3.59|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|70.73|±  |  2.37|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|30.23|±  |  2.86|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|47.35|±  |  2.64|
+|                                                |       |exact_str_match      | 0.00|±  |  0.00|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|29.00|±  |  2.03|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|21.00|±  |  1.54|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|51.33|±  |  2.89|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|33.20|±  |  2.11|
+|bigbench_navigate                               |      0|multiple_choice_grade|55.40|±  |  1.57|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|66.35|±  |  1.06|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|45.76|±  |  2.36|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|28.26|±  |  1.43|
+|bigbench_snarks                                 |      0|multiple_choice_grade|62.43|±  |  3.61|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|50.30|±  |  1.59|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|48.00|±  |  1.58|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|23.60|±  |  1.20|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|17.66|±  |  0.91|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|51.33|±  |  2.89|
+Average: 43.91%
 ## Special thanks & Reference
+- Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb), [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54) and [LLM-AutoEva Notebookl](https://github.com/mlabonne/llm-autoeval)
 - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)