update phrasing
Browse files- templates/about.html +3 -3
- templates/index.html +4 -4
- templates/model_detail.html +2 -0
templates/about.html
CHANGED
@@ -252,9 +252,9 @@
|
|
252 |
Here are the considered context chunks:
|
253 |
</p>
|
254 |
<ul>
|
255 |
-
<li> <b> no_conv </b>: no conversation is simulated the questions from the PVQ-40 questionnaire are given directly </li>
|
256 |
-
<li> <b> no_conv_svs </b>: no conversation is simulated the questions from the SVS questionnaire are given directly </li>
|
257 |
-
<li> <b> chunk_0-chunk-4 </b>:
|
258 |
<li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
|
259 |
<li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
|
260 |
</ul>
|
|
|
252 |
Here are the considered context chunks:
|
253 |
</p>
|
254 |
<ul>
|
255 |
+
<li> <b> no_conv </b>: no conversation is simulated and the questions from the PVQ-40 questionnaire are given directly </li>
|
256 |
+
<li> <b> no_conv_svs </b>: no conversation is simulated and the questions from the SVS questionnaire are given directly </li>
|
257 |
+
<li> <b> chunk_0-chunk-4 </b>: each chunk has 50 reddit posts, which are used as the initial Interlocutor model messages (one per persona). chunk_0 contains the longest posts, chunk_4 the shortest. </li>
|
258 |
<li> <b> chess </b>: "1. e4" is given as the initial message to all personas, but for each persona the Interlocutor model is instructed to simulate a different persona (instead of a human user) </li>
|
259 |
<li> <b> grammar </b>: like chess, but "Can you check this sentence for grammar? \n Whilst Jane was waiting to meet hers friend their nose started bleeding." is given as the initial message.
|
260 |
</ul>
|
templates/index.html
CHANGED
@@ -321,14 +321,14 @@
|
|
321 |
<li><b>Ordinal - Win Rate</b> -
|
322 |
<i>Which model beats the most other models across most metrics?</i>
|
323 |
<div style="margin-left: 20px; margin-top: 5px">
|
324 |
-
|
325 |
-
|
326 |
</li>
|
327 |
<li><b>Cardinal - Score</b> -
|
328 |
<i>Which model has the highest average score?</i>
|
329 |
<div style="margin-left: 20px; margin-top: 5px">
|
330 |
-
|
331 |
-
|
332 |
</li>
|
333 |
</ul>
|
334 |
</p>
|
|
|
321 |
<li><b>Ordinal - Win Rate</b> -
|
322 |
<i>Which model beats the most other models across most metrics?</i>
|
323 |
<div style="margin-left: 20px; margin-top: 5px">
|
324 |
+
The percentage of won games, where a game is a comparison of each model pair, each metric, and each context pair (for stability) or context (for validity metrics)
|
325 |
+
</div>
|
326 |
</li>
|
327 |
<li><b>Cardinal - Score</b> -
|
328 |
<i>Which model has the highest average score?</i>
|
329 |
<div style="margin-left: 20px; margin-top: 5px">
|
330 |
+
The score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)
|
331 |
+
<div>
|
332 |
</li>
|
333 |
</ul>
|
334 |
</p>
|
templates/model_detail.html
CHANGED
@@ -267,7 +267,9 @@
|
|
267 |
<h2>Visualizing the order of simulated personas</h2>
|
268 |
<p>
|
269 |
This image shows the order of personas in each context chunk for each value.
|
|
|
270 |
For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
|
|
|
271 |
Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
|
272 |
</p>
|
273 |
<div class="image-container">
|
|
|
267 |
<h2>Visualizing the order of simulated personas</h2>
|
268 |
<p>
|
269 |
This image shows the order of personas in each context chunk for each value.
|
270 |
+
A chunk refers to the set of text (e.g. reddit posts) that are used to start conversations with different characters.
|
271 |
For each value (row), the personas are ordered on the x-axis by their expression of this value in the `no_conv` setting (gray).
|
272 |
+
In this setting no conversation is simulated and values are scored with PVQ.
|
273 |
Therefore, the Rank-Order stability between the `no_conv` chunk and some chunk corresponds to the extent to which the curve is increasing in that chunk.
|
274 |
</p>
|
275 |
<div class="image-container">
|