Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
Abstract
Chain-of-thought prompting has demonstrated great success in facilitating the reasoning abilities of large language models. In this work, we explore how these enhanced reasoning abilities can be exploited to improve the robustness of large language models in tasks that are not necessarily reasoning-focused. In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations. Empirically, the improvements can be astounding, especially given the simplicity and applicability of the method. For example, in the Natural Questions task, the accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting when 1 out of 10 references provided is corrupted with prompt injection attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting maintains an accuracy of 50%.
Community
π‘οΈ Using Reasoning LLMs for Reliability
The world is investing heavily in reasoning LLMs β but π€ how can it help tasks that arenβt reasoning-intensive?
One angle:
Reasoning abilities (of LLMs) can be exploited for reliability!
We explored this and itβs surprisingly easy & surprisingly effective!
π Read the paper
π Background
LLMs are naturally limited in up-to-date or specialized knowledge.
Thatβs why so many β including OpenAI and Google β augment them with external references (e.g., RAG, search, deep research).
However, when those references are compromised, LLM performance can break down β raising serious reliability concerns:
- π Zou et al. (2024)
- π Greshake et al. (2023)
π§ Introducing Chain-of-Defensive-Thought
We propose a simple, prompting-only method called Chain-of-Defensive-Thought to enhance LLM robustness against corrupted external references.
- No fine-tuning needed
- Just a few exemplars with structured, defensive reasoning
Illustration:
π Key Results
Despite its simplicity, Chain-of-Defensive-Thought significantly improves LLM robustness across a wide range of models!
π Why It Matters
- Simple: Just prompting β no architecture changes.
- Effective: Major reliability improvements.
- Timely: Perfect for boosting systems based on RAG, search augmentation, and retrieval pipelines.
This could open up exciting new research directions with the rise of reasoning-optimized LLMs (e.g., OpenAI's o-series, DeepSeek R1). Thoughts?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning (2025)
- CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation (2025)
- Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models (2025)
- Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance (2025)
- "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding (2025)
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models (2025)
- SaRO: Enhancing LLM Safety through Reasoning-based Alignment (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper