Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 29
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision Paper • 2502.20790 • Published Feb 28