Corey Morris
commited on
Commit
•
18ec1ba
1
Parent(s):
fb25b1e
Modified title and explanation to better reflect what the site is
Browse files
app.py
CHANGED
@@ -104,12 +104,14 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
|
|
104 |
data_provider = ResultDataProcessor()
|
105 |
|
106 |
# st.title('Model Evaluation Results including MMLU by task')
|
107 |
-
st.title('
|
108 |
-
st.markdown("""***Last updated August
|
109 |
st.markdown("""
|
110 |
-
Hugging Face has run evaluations on over
|
111 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
112 |
-
The leaderboard currently displays the overall result for
|
|
|
|
|
113 |
[Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
|
114 |
""")
|
115 |
|
@@ -341,7 +343,7 @@ st.markdown("***Thank you to hugging face for running the evaluations and supply
|
|
341 |
st.markdown("""
|
342 |
# Citation
|
343 |
|
344 |
-
1. Corey Morris (2023). *
|
345 |
|
346 |
2. Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, Thomas Wolf. (2023). *Open LLM Leaderboard*. Hugging Face. [link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
347 |
|
|
|
104 |
data_provider = ResultDataProcessor()
|
105 |
|
106 |
# st.title('Model Evaluation Results including MMLU by task')
|
107 |
+
st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 700+ Open Source Models Across 57 Diverse Evaluation Tasks')
|
108 |
+
st.markdown("""***Last updated August 15th***""")
|
109 |
st.markdown("""
|
110 |
+
Hugging Face has run evaluations on over 700 open source models and provides results on a
|
111 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
112 |
+
The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
|
113 |
+
This app provides a way to explore the results for individual tasks and compare models across tasks.
|
114 |
+
There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
|
115 |
[Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
|
116 |
""")
|
117 |
|
|
|
343 |
st.markdown("""
|
344 |
# Citation
|
345 |
|
346 |
+
1. Corey Morris (2023). *Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 700+ Open Source Models Across 57 Diverse Evaluation Tasks*. [link](https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard)
|
347 |
|
348 |
2. Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, Thomas Wolf. (2023). *Open LLM Leaderboard*. Hugging Face. [link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
349 |
|