Spaces:

flowers-team
/

StickToYourRoleLeaderboard

Running

grg commited on Sep 28

Commit

1d3b79e

•

1 Parent(s): abb889e

update phrasing

Files changed (3) hide show

templates/about.html CHANGED Viewed

@@ -252,9 +252,9 @@
                 Here are the considered context chunks:
             </p>
             <ul>
-                <li> <b> no_conv </b>: no conversation is simulated the questions from the PVQ-40 questionnaire are given directly </li>
-                <li> <b> no_conv_svs </b>: no conversation is simulated the questions from the SVS questionnaire are given directly </li>
-                <li> <b> chunk_0-chunk-4 </b>: <a target="_blank" href="https://gitlab.inria.fr/gkovac/value_stability/-/tree/master/contexts/leaderboard_reddit_chunks?ref_type=heads">50 reddit posts</a> used as the initial Interlocutor model messages (one per persona). chunk_0 contains the longest posts, chunk_4 the shortest. </li>
                 <li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
                 <li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
             </ul>

                 Here are the considered context chunks:
             </p>
             <ul>
+                <li> <b> no_conv </b>: no conversation is simulated and the questions from the PVQ-40 questionnaire are given directly </li>
+                <li> <b> no_conv_svs </b>: no conversation is simulated and the questions from the SVS questionnaire are given directly </li>
+                <li> <b> chunk_0-chunk-4 </b>: each chunk has 50 reddit posts, which are used as the initial Interlocutor model messages (one per persona). chunk_0 contains the longest posts, chunk_4 the shortest. </li>
                 <li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
                 <li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
             </ul>

templates/index.html CHANGED Viewed

@@ -321,14 +321,14 @@
                 <li><b>Ordinal - Win Rate</b> -
                     <i>Which model beats the most other models across most metrics?</i>
                     <div style="margin-left: 20px; margin-top: 5px">
-                    The score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)
-                    <div>
                 </li>
                 <li><b>Cardinal - Score</b> -
                     <i>Which model has the highest average score?</i>
                     <div style="margin-left: 20px; margin-top: 5px">
-                    The percentage of won games, where a game is a comparison of each model pair, each metric, and each context pair (for stability) or context (for validity metrics)
-                    </div>
                 </li>
             </ul>
         </p>

                 <li><b>Ordinal - Win Rate</b> -
                     <i>Which model beats the most other models across most metrics?</i>
                     <div style="margin-left: 20px; margin-top: 5px">
+                        The percentage of won games, where a game is a comparison of each model pair, each metric, and each context pair (for stability) or context (for validity metrics)
+                    </div>
                 </li>
                 <li><b>Cardinal - Score</b> -
                     <i>Which model has the highest average score?</i>
                     <div style="margin-left: 20px; margin-top: 5px">
+                        The score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)
+                    <div>
                 </li>
             </ul>
         </p>

templates/model_detail.html CHANGED Viewed

@@ -267,7 +267,9 @@
             <h2>Visualizing the order of simulated personas</h2>
             <p>
                 This image shows the order of personas in each context chunk for each value.
                 For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
                 Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
             </p>
             <div class="image-container">

             <h2>Visualizing the order of simulated personas</h2>
             <p>
                 This image shows the order of personas in each context chunk for each value.
+                A chunk refers to the set of text (e.g. reddit posts) that are used to start conversations with different characters.
                 For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
+                In this setting no conversation is simulated and values are scored with PVQ.
                 Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
             </p>
             <div class="image-container">