This wasn't part of the original scheme... but it works very well if you have a domain-specific task to evaluate. In my experience, if you use a system prompt, you'll get within striking distance of using 4o as a PRM.
· Sign up or log in to comment