grg commited on
Commit
846ce90
·
1 Parent(s): bce438f

update methodology with CoT

Browse files
Files changed (1) hide show
  1. templates/about.html +1 -0
templates/about.html CHANGED
@@ -369,6 +369,7 @@ their expression of that value).
369
  These changes were made to keep up with the newly released model and to make the evaluation more detailed.
370
  We describe additions made in the leaderboard here for clarity:
371
  <ol>
 
372
  <li>a new population was created and was balanced with respect to gender</li>
373
  <li>context chunks - instead of evaluating the stability of a population between pairs of contexts, where all personas are given the same topic (e.g. chess), we evaluate it between pairs of context chunks, where each participant is given a different random context</li>
374
  <li>more diverse and longer contexts (up to 6k tokens) were created with reddit posts from the <a target="_blank" href="https://webis.de/data/webis-tldr-17.html">webis dataset</a> (the dataset was cleaned to exclude posts from NSFW subreddits)</li>
 
369
  These changes were made to keep up with the newly released model and to make the evaluation more detailed.
370
  We describe additions made in the leaderboard here for clarity:
371
  <ol>
372
+ <li>Chain-of-Thought (CoT) evaluation was used</li>
373
  <li>a new population was created and was balanced with respect to gender</li>
374
  <li>context chunks - instead of evaluating the stability of a population between pairs of contexts, where all personas are given the same topic (e.g. chess), we evaluate it between pairs of context chunks, where each participant is given a different random context</li>
375
  <li>more diverse and longer contexts (up to 6k tokens) were created with reddit posts from the <a target="_blank" href="https://webis.de/data/webis-tldr-17.html">webis dataset</a> (the dataset was cleaned to exclude posts from NSFW subreddits)</li>