arxiv:2510.02919

Self-Reflective Generation at Test Time

Published on Oct 3

· Submitted by

mj on Oct 7

Authors:

Abstract

SRGen, a lightweight test-time framework, improves LLM reasoning by dynamically identifying and correcting high-uncertainty tokens during generation, leading to better single-pass quality and self-consistency.

AI-generated summary

Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought, but their forward-only autoregressive generation process is fragile; early token errors can cascade, which creates a clear need for self-reflection mechanisms. However, existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training, both fundamentally reactive and inefficient. To address this, we propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points. During token generation, SRGen utilizes dynamic entropy thresholding to identify high-uncertainty tokens. For each identified token, it trains a specific corrective vector, which fully exploits the already generated context for a self-reflective generation to correct the token probability distribution. By retrospectively analyzing the partial output, this self-reflection enables more trustworthy decisions, thereby significantly reducing the probability of errors at highly uncertain points. Evaluated on challenging mathematical reasoning benchmarks and a diverse set of LLMs, SRGen can consistently strengthen model reasoning: improvements in single-pass quality also translate into stronger self-consistency voting. Especially, on AIME2024 with DeepSeek-R1-Distill-Qwen-7B, SRGen yields absolute improvements of +12.0% on Pass@1 and +13.3% on Cons@5. Moreover, our findings position SRGen as a plug-and-play method that integrates reflection into the generation process for reliable LLM reasoning, achieving consistent gains with bounded overhead and broad composability with other training-time (e.g., RLHF) and test-time (e.g., SLOT) techniques.

View arXiv page View PDF GitHub 0 Add to collection

Community

mujianijan

Paper submitter about 19 hours ago

•

edited about 19 hours ago

SRGen brings reflection into generation at test time by detecting high-uncertainty tokens, learning tiny per-token corrective vectors from the partial output, and continuing with a corrected distribution, which reduces error cascades and strengthens single-pass as well as self-consistency reasoning; on AIME2024 with DeepSeek-R1-Distill-Qwen-7B it delivers +12.0% Pass@1 and +13.3% Cons@5, with bounded overhead and clean compatibility with training-time and test-time techniques.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.02919 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.02919 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.02919 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.