OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30 • 13
view article Article FuseO1-Preview: System-II Reasoning Fusion of LLMs By Wanfq and 4 others • Jan 20 • 22
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24 • 26