I use some helpsteer2 validation data to test. I found that the scores output by this tool are significantly different from those score by helpsteer2. why?
· Sign up or log in to comment