LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper • 2503.00735 • Published Mar 2 • 22
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published Mar 7 • 39
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 51
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated 10 days ago • 9.73k • 180
microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated 14 days ago • 12.5k • 212