AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
Abstract
Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. In addition, we provide a thorough analysis and case study on the PRM's performance, transferability, and generalization capabilities.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding (2025)
- Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning (2025)
- The Lessons of Developing Process Reward Models in Mathematical Reasoning (2025)
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning (2025)
- AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification (2025)
- VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data (2025)
- Reinforcing Thinking through Reasoning-Enhanced Reward Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper