VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published 11 days ago • 24
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning Paper • 2504.00891 • Published 17 days ago • 12
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 17 days ago • 61
General Reasoning Requires Learning to Reason from the Get-go Paper • 2502.19402 • Published Feb 26 • 5
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17, 2024 • 29
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published 28 days ago • 35
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity? Paper • 2503.15242 • Published 30 days ago • 9
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 29 days ago • 46
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published about 1 month ago • 138
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 41
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 84
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published Mar 7 • 7
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 29