arxiv:2506.06020

When to Trust Context: Self-Reflective Debates for Context Reliability

Published on Jun 6

· Submitted by

fangwu97 on Jun 12

Upvote

Authors:

Fang Wu ,

Abstract

A lightweight framework integrating token-level self-confidence and an asymmetric debate between agents enhances the robustness of large language models to contextual inconsistencies with minimal computational cost.

AI-generated summary

Large language models frequently encounter conflicts between their parametric knowledge and contextual input, often resulting in factual inconsistencies or hallucinations. We propose Self-Reflective Debate for Contextual Reliability (SR-DCR), a lightweight framework that integrates token-level self-confidence with an asymmetric multi-agent debate to adjudicate such conflicts. A critic, deprived of context, challenges a defender who argues from the given passage; a judge model evaluates the debate and determines the context's reliability. The final answer is selected by combining the verdict with model confidence. Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently enhances robustness to misleading context while maintaining accuracy on trustworthy inputs, outperforming both classical debate and confidence-only baselines with minimal computational overhead. The code is available at https://github.com/smiles724/Self-Reflective-Debates.

View arXiv page View PDF GitHub repository Add to collection

Community

fangwu97

Paper author Paper submitter 1 day ago

🧠 When to Trust Context: Self-Reflective Debates for Context Reliability

In this work, we tackle a fundamental challenge in LLM alignment: how should a model respond when its internal knowledge disagrees with the context it's given?

We introduce SR-DCR (Self-Reflective Debate for Contextual Reliability), a lightweight framework that combines:

Token-level self-confidence, to assess whether the model “knows” the answer on its own.

Asymmetric multi-agent debate, where one agent defends the context and another critiques it without access to the passage.

A judge model that adjudicates the debate and decides if the context should be trusted.

🔍 Our method improves robustness against hallucinated or misleading context, especially in retrieval-augmented generation (RAG) settings.
🧪 We evaluate on the ClashEval benchmark and show consistent gains over classical debate and confidence-only methods, across GPT-4o, Claude 3, and LLaMA 3.

📚 Paper: https://arxiv.org/abs/2506.06020
💻 Code: github.com/smiles724/Self-Reflective-Debates

We're excited to contribute tools for better interpretability, robustness, and reasoning in LLMs. Try it out and let us know what you think!

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.06020 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.06020 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.06020 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.