Corey Morris
commited on
Commit
•
d396c1e
1
Parent(s):
73da8d6
Updated to reflect number of models. Previously, I think there were duplicates
Browse files
app.py
CHANGED
@@ -123,11 +123,11 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
|
|
123 |
data_provider = ResultDataProcessor()
|
124 |
|
125 |
# st.title('Model Evaluation Results including MMLU by task')
|
126 |
-
st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing
|
127 |
st.markdown("""***Last updated August 22th***""")
|
128 |
st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
|
129 |
st.markdown("""
|
130 |
-
Hugging Face has run evaluations on over
|
131 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
132 |
The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
|
133 |
This app provides a way to explore the results for individual tasks and compare models across tasks.
|
|
|
123 |
data_provider = ResultDataProcessor()
|
124 |
|
125 |
# st.title('Model Evaluation Results including MMLU by task')
|
126 |
+
st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 900+ Open Source Models Across 57 Diverse Evaluation Tasks')
|
127 |
st.markdown("""***Last updated August 22th***""")
|
128 |
st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
|
129 |
st.markdown("""
|
130 |
+
Hugging Face has run evaluations on over 900 open source models and provides results on a
|
131 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
132 |
The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
|
133 |
This app provides a way to explore the results for individual tasks and compare models across tasks.
|