Submitted by Yongding Tao 2 Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models Peking University 6 2