Spaces:

CoreyMorris
/

MMLU-by-task-Leaderboard

Running

Corey Morris commited on Aug 24, 2023

Commit

d396c1e

•

1 Parent(s): 73da8d6

Updated to reflect number of models. Previously, I think there were duplicates

Files changed (1) hide show

app.py CHANGED Viewed

@@ -123,11 +123,11 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
 data_provider = ResultDataProcessor()
 # st.title('Model Evaluation Results including MMLU by task')
-st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 1100+ Open Source Models Across 57 Diverse Evaluation Tasks')
 st.markdown("""***Last updated August 22th***""")
 st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
 st.markdown("""
-            Hugging Face has run evaluations on over 800 open source models and provides results on a
             [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
             The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
             This app provides a way to explore the results for individual tasks and compare models across tasks.

 data_provider = ResultDataProcessor()
 # st.title('Model Evaluation Results including MMLU by task')
+st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 900+ Open Source Models Across 57 Diverse Evaluation Tasks')
 st.markdown("""***Last updated August 22th***""")
 st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
 st.markdown("""
+            Hugging Face has run evaluations on over 900 open source models and provides results on a
             [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
             The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
             This app provides a way to explore the results for individual tasks and compare models across tasks.