Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Paper
•
2506.05256
•
Published
•
2
Teaching language models to think efficiently with Adaptive Length Penalty (ALP)