Bridging Offline and Online Reinforcement Learning for LLMs Paper • 2506.21495 • Published 7 days ago • 1
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published 2 days ago • 35
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published 3 days ago • 36
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling Paper • 2506.22049 • Published 7 days ago • 2
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 10 days ago • 43 • 5
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 10 days ago • 43 • 5
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 10 days ago • 43
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published 11 days ago • 7
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Paper • 2506.18880 • Published 10 days ago • 1
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published 11 days ago • 31
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published 28 days ago • 62
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 95
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 132
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated 29 days ago • 14.5k • • 170