DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Abstract
Vision-language models (VLMs) are prone to object hallucinations, where they erroneously indicate the presenceof certain objects in an image. Existing benchmarks quantify hallucinations using relatively small, labeled datasets. However, this approach is i) insufficient to assess hallucinations that arise in open-world settings, where VLMs are widely used, and ii) inadequate for detecting systematic errors in VLMs. We propose DASH (Detection and Assessment of Systematic Hallucinations), an automatic, large-scale pipeline designed to identify systematic hallucinations of VLMs on real-world images in an open-world setting. A key component is DASH-OPT for image-based retrieval, where we optimize over the ''natural image manifold'' to generate images that mislead the VLM. The output of DASH consists of clusters of real and semantically similar images for which the VLM hallucinates an object. We apply DASH to PaliGemma and two LLaVA-NeXT models across 380 object classes and, in total, find more than 19k clusters with 950k images. We study the transfer of the identified systematic hallucinations to other VLMs and show that fine-tuning PaliGemma with the model-specific images obtained with DASH mitigates object hallucinations. Code and data are available at https://YanNeu.github.io/DASH.
Community
We propose DASH, a large-scale, fully automated pipeline requiring no human labeling for identifying systematic object hallucinations in VLMs.
Code and URLs for 950K images that trigger object hallucinations are available on github:
https://github.com/YanNeu/DASH
We also propose a new benchmark, DASH-B, to enable a more reliable evaluation of object hallucinations in VLMs:
https://github.com/YanNeu/DASH-B
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Seeing What's Not There: Spurious Correlation in Multimodal LLMs (2025)
- Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models (2025)
- Mitigating Hallucinations in YOLO-based Object Detection Models: A Revisit to Out-of-Distribution Detection (2025)
- DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities (2025)
- CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base (2025)
- Treble Counterfactual VLMs: A Causal Approach to Hallucination (2025)
- Understanding and Evaluating Hallucinations in 3D Visual Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper