Bridging Offline and Online Reinforcement Learning for LLMs Paper • 2506.21495 • Published 7 days ago • 1
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published 2 days ago • 35
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published 3 days ago • 36
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling Paper • 2506.22049 • Published 7 days ago • 2
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 10 days ago • 43
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published 11 days ago • 7
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Paper • 2506.18880 • Published 10 days ago • 1
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published 11 days ago • 31
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published 28 days ago • 62
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 95
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 132
AbsenceBench: Language Models Can't Tell What's Missing Paper • 2506.11440 • Published 21 days ago • 1
VerIF Collection RL trained models and datasets for instruction-following • 7 items • Updated 22 days ago • 3
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published 22 days ago • 6
General-Reasoner Collection Advancing LLMs' general reasoning capabilities • 9 items • Updated 9 days ago • 4
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published 23 days ago • 26