J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization Paper • 2505.13346 • Published May 19 • 2