Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Paper • 2506.05256 • Published 28 days ago • 2
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 39
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 97
PERSONA: A Reproducible Testbed for Pluralistic Alignment Paper • 2407.17387 • Published Jul 24, 2024 • 20