grg commited on
Commit
b19d196
1 Parent(s): ca60da9

Decorative changes

Browse files
Files changed (2) hide show
  1. templates/about.html +29 -12
  2. templates/model_detail.html +3 -1
templates/about.html CHANGED
@@ -65,12 +65,32 @@
65
  border-top-right-radius: 10px;
66
  }
67
  .section{
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  padding-left: 150px;
69
  padding-right: 150px;
 
 
 
70
  text-align: left;
 
71
  }
 
72
  .citation-section {
73
- margin-top: 5px;
 
74
  text-align: center;
75
  }
76
  .citation-box {
@@ -100,7 +120,6 @@
100
  font-size: 24px;
101
  font-weight: bold;
102
  text-align: center;
103
- margin-top: 40px;
104
  margin-bottom: 40px;
105
  padding: 20px; /* Add padding for more margin around text */
106
  background-color: #610b5d;
@@ -109,7 +128,7 @@
109
  }
110
  .back-button {
111
  text-align: center;
112
- margin-top: 20px;
113
  }
114
  .custom-button {
115
  background-color: #610b5d;
@@ -135,33 +154,31 @@
135
  <div class="section">
136
  <div class="section-title">Motivation</div>
137
  <p>
138
- Benchmarks usually compare models with MANY QUESTIONS from A SINGLE MINIMAL CONTEXT, e.g. as multiple choices questions.
139
  This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
140
- We argue that CONTEXT-DEPENDENCE can be seen as a PROPERTY of LLMs: a dimension of LLM comparison alongside others like size, speed, or knowledge.
141
- We evaluate LLMs by asking the SAME QUESTIONS from MANY DIFFERENT CONTEXTS.
142
  </p>
143
  <p>
144
  LLMs are often used to simulate personas and populations.
145
  We study the coherence of simulated populations over different contexts (conversations on different topics).
146
  To do that we leverage the psychological methodology to study the interpersonal stability of personal value expression of those simulated populations.
147
- We adopt the Schwartz Theory of Basic Personal Values that defines 10 values: Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism.
148
- To score those values we use the associated questionnaires: PVQ-40, and SVS.
149
  </p>
150
  </div>
151
  <div class="section">
152
  <div class="section-title">Administering a questionnaire in context to a simulated persona</div>
153
- <p>To evaluate the stability on a population level we need to be able to evaluate a value profile expressed by a simulated individual in a specific context (conversation topic).</p>
154
  <ol>
155
  <li> The Tested model is instructed to simulate a persona</li>
156
  <li> A separate model instance - The Interlocutor - is instructed to simulate a “human using a chatbot”
157
  <li> A conversation topic is induced by manually setting the first Interlocutor’s message (e.g. Tell me a
158
  joke)
159
  <li> A conversation is simulated
160
- <li> A question from the questionnaire is set as the last Interlocutor’s message and The Tested model’s
161
  response is recorded (this is repeated for every item in the questionnaire)
162
  <li> The questionnaire is scored to obtain scores for the 10 personal values
163
- <li> The whole process is repeated for each persona with five different conversation topics
164
- <li> Rank-Order and Ipsative stability are estimated between pairs of contexts and then averaged
165
  </ol>
166
  <div class="image-container">
167
  <a href="{{ url_for('static', filename='figures/admin_questionnaire.svg') }}" target="_blank">
 
65
  border-top-right-radius: 10px;
66
  }
67
  .section{
68
+ padding-top: 19px;
69
+ text-align: left;
70
+ }
71
+
72
+ .section p {
73
+ padding-left: 150px;
74
+ padding-right: 150px;
75
+ text-indent: 2em;
76
+ margin: auto;
77
+ margin-bottom: 10px;
78
+ text-align: left;
79
+ }
80
+
81
+ .section ol,ul {
82
  padding-left: 150px;
83
  padding-right: 150px;
84
+ margin: auto;
85
+ margin-bottom: 20px;
86
+ margin-left: 50px;
87
  text-align: left;
88
+ margin-top: 0px;
89
  }
90
+
91
  .citation-section {
92
+ width: 100%;
93
+ margin-top: 50px;
94
  text-align: center;
95
  }
96
  .citation-box {
 
120
  font-size: 24px;
121
  font-weight: bold;
122
  text-align: center;
 
123
  margin-bottom: 40px;
124
  padding: 20px; /* Add padding for more margin around text */
125
  background-color: #610b5d;
 
128
  }
129
  .back-button {
130
  text-align: center;
131
+ margin-top: 50px;
132
  }
133
  .custom-button {
134
  background-color: #610b5d;
 
154
  <div class="section">
155
  <div class="section-title">Motivation</div>
156
  <p>
157
+ Benchmarks usually compare models with <b>MANY QUESTIONS</b> from <b>A SINGLE MINIMAL CONTEXT</b>, e.g. as multiple choices questions.
158
  This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
159
+ We argue that <b>CONTEXT-DEPENDENCE</b> can be seen as a <b>PROPERTY of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
160
+ We evaluate LLMs by asking the <b> SAME QUESTIONS </b> from <b> MANY DIFFERENT CONTEXTS </b>.
161
  </p>
162
  <p>
163
  LLMs are often used to simulate personas and populations.
164
  We study the coherence of simulated populations over different contexts (conversations on different topics).
165
  To do that we leverage the psychological methodology to study the interpersonal stability of personal value expression of those simulated populations.
166
+ We adopt the Schwartz Theory of Basic Personal Values that defines 10 values: Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism,
167
+ to evaluate their expression we use the associated questionnaires: PVQ-40, and SVS.
168
  </p>
169
  </div>
170
  <div class="section">
171
  <div class="section-title">Administering a questionnaire in context to a simulated persona</div>
172
+ <p>To evaluate the stability on a population level we need to be able to evaluate a <b>value profile</b> expressed by a <b>simulated individual</b> in a <b>specific context</b> (conversation topic). To do that we use the following procedure:</p>
173
  <ol>
174
  <li> The Tested model is instructed to simulate a persona</li>
175
  <li> A separate model instance - The Interlocutor - is instructed to simulate a “human using a chatbot”
176
  <li> A conversation topic is induced by manually setting the first Interlocutor’s message (e.g. Tell me a
177
  joke)
178
  <li> A conversation is simulated
179
+ <li> A question from the questionnaire is set as the last Interlocutor’s last message and The Tested model’s
180
  response is recorded (this is repeated for every item in the questionnaire)
181
  <li> The questionnaire is scored to obtain scores for the 10 personal values
 
 
182
  </ol>
183
  <div class="image-container">
184
  <a href="{{ url_for('static', filename='figures/admin_questionnaire.svg') }}" target="_blank">
templates/model_detail.html CHANGED
@@ -44,10 +44,12 @@
44
  margin-bottom: 20px;
45
  }
46
  .image-section p {
47
- width: 80%;
48
  margin: auto;
 
 
49
  margin-bottom: 20px;
50
  text-align: left;
 
51
  }
52
  .image-container {
53
  width: 100%;
 
44
  margin-bottom: 20px;
45
  }
46
  .image-section p {
 
47
  margin: auto;
48
+ padding-left: 150px;
49
+ padding-right: 150px;
50
  margin-bottom: 20px;
51
  text-align: left;
52
+ text-indent: 2em;
53
  }
54
  .image-container {
55
  width: 100%;