Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published 6 days ago • 17
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning Paper • 2407.00782 • Published Jun 30, 2024 • 26 • 4