Issue with Reproducing Results on v1.1 Dataset (Humaneval Score Lower Than Expected)

by XXSg559 - opened Jun 8

Jun 8

Hello,

I used your v1.1 dataset and the code from the paper's repository.
However, following the settings described in the paper, I wasn't able to reproduce the reported results—my best Humaneval score was only 0.7, which is still quite far from the reported 0.8.

Even after increasing the learning rate, the best checkpoint I could find only achieved 0.77, and now I'm honestly quite confused.

Also, it seems that the test file provided is incorrect—the .json file actually contains HTML content.

Thank you very much for your contributions!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment