Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 26 days ago • 58
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training Paper • 2602.03411 • Published Feb 3 • 37
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published Feb 3 • 40
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Paper • 2412.05631 • Published Dec 7, 2024 • 2
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120 • 19
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120 • 19
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Paper • 2412.05631 • Published Dec 7, 2024 • 2