update methodology with CoT
Browse files- templates/about.html +1 -0
templates/about.html
CHANGED
@@ -369,6 +369,7 @@ their expression of that value).
|
|
369 |
These changes were made to keep up with the newly released model and to make the evaluation more detailed.
|
370 |
We describe additions made in the leaderboard here for clarity:
|
371 |
<ol>
|
|
|
372 |
<li>a new population was created and was balanced with respect to gender</li>
|
373 |
<li>context chunks - instead of evaluating the stability of a population between pairs of contexts, where all personas are given the same topic (e.g. chess), we evaluate it between pairs of context chunks, where each participant is given a different random context</li>
|
374 |
<li>more diverse and longer contexts (up to 6k tokens) were created with reddit posts from the <a target="_blank" href="https://webis.de/data/webis-tldr-17.html">webis dataset</a> (the dataset was cleaned to exclude posts from NSFW subreddits)</li>
|
|
|
369 |
These changes were made to keep up with the newly released model and to make the evaluation more detailed.
|
370 |
We describe additions made in the leaderboard here for clarity:
|
371 |
<ol>
|
372 |
+
<li>Chain-of-Thought (CoT) evaluation was used</li>
|
373 |
<li>a new population was created and was balanced with respect to gender</li>
|
374 |
<li>context chunks - instead of evaluating the stability of a population between pairs of contexts, where all personas are given the same topic (e.g. chess), we evaluate it between pairs of context chunks, where each participant is given a different random context</li>
|
375 |
<li>more diverse and longer contexts (up to 6k tokens) were created with reddit posts from the <a target="_blank" href="https://webis.de/data/webis-tldr-17.html">webis dataset</a> (the dataset was cleaned to exclude posts from NSFW subreddits)</li>
|