Papers
arxiv:2509.23379

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

Published on Sep 27
Ā· Submitted by Xi Zhang on Oct 8
Authors:
,
,

Abstract

Clinical Contrastive Cecoding (CCD) enhances radiology report generation by integrating structured clinical signals, reducing medical hallucinations without altering the base MLLM.

AI-generated summary

Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.

Community

Paper author Paper submitter
•
edited 9 days ago

CCD_icon_logo

šŸš€ CCD (Clinical Contrastive Decoding) helps radiology MLLMs generate more accurate and faithful reports by reducing medical hallucinations — all without retraining or extra data.

CCD introduces a dual contrastive decoding strategy:

  • 🩺 Symptom-grounded decoding uses reliable clinical anchors to guide generation.
  • 🧠 Expert-informed decoding adjusts token probabilities based on expert knowledge.

This lightweight, plug-and-play, training-free, and retrieval-free method improves clinical consistency and accuracy across multiple datasets and models, including MAIRA-2, Libra, and LLaVA-Rad.

šŸ“ˆ On MIMIC-CXR, CCD achieves up to +17% RadGraph-F1 improvement on MAIRA-2, greatly reducing unsupported findings and ensuring clinically grounded, image-faithful outputs.

šŸ”— Resources:
šŸŒ Project:
šŸ“¦ GitHub: https://x-izhang.github.io/CCD/
šŸ¤— Models: Libra Collections
šŸ¤— Evaluation dataset: CCD Collections
šŸŽ® Online Demo: https://huggingface.co/spaces/X-iZhang/CCD

CCD_framework_new

Paper author Paper submitter

⚔ TL;DR: CCD is a lightweight, training-free, and retrieval-free decoding framework that substantially improves clinical accuracy and reduces medical hallucinations in radiology MLLMs such as MAIRA-2.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 4

Spaces citing this paper 2

Collections including this paper 1