CoreyMorris
commited on
Commit
•
e05c716
1
Parent(s):
36799a9
updated with new data
Browse files- app.py +1 -0
- processed_data_2024-04-16.csv +0 -0
app.py
CHANGED
@@ -115,6 +115,7 @@ st.title('Interactive Portal for Analyzing Open Source Large Language Models')
|
|
115 |
st.markdown("""***Last updated March 17th 2024***""")
|
116 |
st.markdown("""**It has not been updated to correctly extract the parameter number from mixture of experts models.**""")
|
117 |
st.markdown("""**As of 04-17-2024, this data was not generated using the chat templates. Smaller models are especially sensative to this and other aspects related to the format of the inputs.**""")
|
|
|
118 |
st.markdown("""
|
119 |
This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
|
120 |
There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
|
|
|
115 |
st.markdown("""***Last updated March 17th 2024***""")
|
116 |
st.markdown("""**It has not been updated to correctly extract the parameter number from mixture of experts models.**""")
|
117 |
st.markdown("""**As of 04-17-2024, this data was not generated using the chat templates. Smaller models are especially sensative to this and other aspects related to the format of the inputs.**""")
|
118 |
+
st.markdown("""For a good sense of general relative performance of models, I would highly reccomend this leaderboard https://chat.lmsys.org/""")
|
119 |
st.markdown("""
|
120 |
This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
|
121 |
There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
|
processed_data_2024-04-16.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|