CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding
Abstract
Clinical Contrastive Cecoding (CCD) enhances radiology report generation by integrating structured clinical signals, reducing medical hallucinations without altering the base MLLM.
Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.
Community
š CCD
(Clinical Contrastive Decoding) helps radiology MLLMs generate more accurate and faithful reports by reducing medical hallucinations ā all without retraining or extra data.
CCD
introduces a dual contrastive decoding strategy:
- 𩺠Symptom-grounded decoding uses reliable clinical anchors to guide generation.
- š§ Expert-informed decoding adjusts token probabilities based on expert knowledge.
This lightweight, plug-and-play, training-free, and retrieval-free method improves clinical consistency and accuracy across multiple datasets and models, including MAIRA-2, Libra, and LLaVA-Rad.
š On MIMIC-CXR, CCD
achieves up to +17% RadGraph-F1 improvement on MAIRA-2, greatly reducing unsupported findings and ensuring clinically grounded, image-faithful outputs.
š Resources:
š Project:
š¦ GitHub: https://x-izhang.github.io/CCD/
š¤ Models: Libra Collections
š¤ Evaluation dataset: CCD Collections
š® Online Demo: https://huggingface.co/spaces/X-iZhang/CCD
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs (2025)
- Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors (2025)
- Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding (2025)
- D-LEAF: Localizing and Correcting Hallucinations in Multimodal LLMs via Layer-to-head Attention Diagnostics (2025)
- Improving Alignment in LVLMs with Debiased Self-Judgment (2025)
- MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models (2025)
- Exploring and Mitigating Fawning Hallucinations in Large Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend