Corey Morris commited on
Commit
2037152
1 Parent(s): 9ecc99c

check for URL and full model name

Browse files
Files changed (1) hide show
  1. app.py +6 -5
app.py CHANGED
@@ -115,9 +115,6 @@ st.title('Interactive Portal for Analyzing Open Source Large Language Models')
115
  st.markdown("""***Last updated October 6th***""")
116
  st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
117
  st.markdown("""
118
- Hugging Face runs evaluations on open source models and provides results on a
119
- [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
120
- The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
121
  This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
122
  There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
123
  [Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
@@ -260,9 +257,13 @@ st.markdown("***The dashed red line indicates random chance accuracy of 0.25 as
260
  st.markdown("***")
261
  st.write("As expected, there is a strong positive relationship between the number of parameters and average performance on the MMLU evaluation.")
262
 
 
263
  column_list_for_plotting = filtered_data.columns.tolist()
264
- column_list_for_plotting.remove('URL')
265
- column_list_for_plotting.remove('full_model_name')
 
 
 
266
  selected_x_column = st.selectbox('Select x-axis', column_list_for_plotting, index=0)
267
  selected_y_column = st.selectbox('Select y-axis', column_list_for_plotting, index=1)
268
 
 
115
  st.markdown("""***Last updated October 6th***""")
116
  st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
117
  st.markdown("""
 
 
 
118
  This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
119
  There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
120
  [Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
 
257
  st.markdown("***")
258
  st.write("As expected, there is a strong positive relationship between the number of parameters and average performance on the MMLU evaluation.")
259
 
260
+
261
  column_list_for_plotting = filtered_data.columns.tolist()
262
+ if 'URL' in column_list_for_plotting:
263
+ column_list_for_plotting.remove('URL')
264
+ if 'full_model_name' in column_list_for_plotting:
265
+ column_list_for_plotting.remove('full_model_name')
266
+
267
  selected_x_column = st.selectbox('Select x-axis', column_list_for_plotting, index=0)
268
  selected_y_column = st.selectbox('Select y-axis', column_list_for_plotting, index=1)
269