Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 4 days ago • 30
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published 15 days ago • 106
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Paper • 2508.03686 • Published 10 days ago • 32
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published 24 days ago • 36
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • 18 days ago • 153
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published 25 days ago • 19
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published about 1 month ago • 41
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Paper • 2507.08800 • Published Jul 11 • 79
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 86
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Paper • 2506.19290 • Published Jun 24 • 50
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 68
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 128
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond Paper • 2505.19641 • Published May 26 • 67