Corey Morris
commited on
Commit
•
e7c50af
1
Parent(s):
e1345be
changed the wording of moral scenarios
Browse files
app.py
CHANGED
@@ -340,21 +340,23 @@ fig = create_plot(filtered_data, 'Parameters', 'MMLU_abstract_algebra')
|
|
340 |
st.plotly_chart(fig)
|
341 |
|
342 |
# Moral scenarios plots
|
343 |
-
st.markdown("### Moral Scenarios
|
344 |
def show_random_moral_scenarios_question():
|
345 |
moral_scenarios_data = pd.read_csv('moral_scenarios_questions.csv')
|
346 |
random_question = moral_scenarios_data.sample()
|
347 |
expander = st.expander("Show a random moral scenarios question")
|
348 |
expander.write(random_question['query'].values[0])
|
349 |
|
350 |
-
|
351 |
|
352 |
st.write("""
|
353 |
-
|
354 |
-
|
355 |
-
|
356 |
""")
|
357 |
|
|
|
|
|
358 |
fig = create_plot(filtered_data, 'Parameters', 'MMLU_moral_scenarios', title="Impact of Parameter Count on Accuracy for Moral Scenarios")
|
359 |
st.plotly_chart(fig)
|
360 |
st.write()
|
|
|
340 |
st.plotly_chart(fig)
|
341 |
|
342 |
# Moral scenarios plots
|
343 |
+
st.markdown("### MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures")
|
344 |
def show_random_moral_scenarios_question():
|
345 |
moral_scenarios_data = pd.read_csv('moral_scenarios_questions.csv')
|
346 |
random_question = moral_scenarios_data.sample()
|
347 |
expander = st.expander("Show a random moral scenarios question")
|
348 |
expander.write(random_question['query'].values[0])
|
349 |
|
350 |
+
|
351 |
|
352 |
st.write("""
|
353 |
+
After a deeper dive into the moral scenarios task, it appears that benchmark is not a valid measurement of moral judgement.
|
354 |
+
The challenges these models face are not rooted in understanding each scenario, but rather in the structure of the task itself.
|
355 |
+
I would recommend using a different benchmark for moral judgement. More details of the analysis can be found here: [MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures ](https://medium.com/p/74fd6e512521)
|
356 |
""")
|
357 |
|
358 |
+
show_random_moral_scenarios_question()
|
359 |
+
|
360 |
fig = create_plot(filtered_data, 'Parameters', 'MMLU_moral_scenarios', title="Impact of Parameter Count on Accuracy for Moral Scenarios")
|
361 |
st.plotly_chart(fig)
|
362 |
st.write()
|