Issue with Reproducing Results on v1.1 Dataset (Humaneval Score Lower Than Expected)

#1
by XXSg559 - opened

Hello,

I used your v1.1 dataset and the code from the paper's repository.
However, following the settings described in the paper, I wasn't able to reproduce the reported results—my best Humaneval score was only 0.7, which is still quite far from the reported 0.8.

Even after increasing the learning rate, the best checkpoint I could find only achieved 0.77, and now I'm honestly quite confused.

Also, it seems that the test file provided is incorrect—the .json file actually contains HTML content.

Thank you very much for your contributions!

Sign up or log in to comment