Papers
arxiv:2501.18463

A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models

Published on Jan 30
Authors:
,
,
,

Abstract

Three new OOD detection benchmarks (ImageNet-X, ImageNet-FS-X, and Wilds-FS-X) assess method robustness to semantic and covariate shifts, revealing limitations in CLIP-based approaches.

AI-generated summary

Out-of-distribution (OOD) detection is a task that detects OOD samples during inference to ensure the safety of deployed models. However, conventional benchmarks have reached performance saturation, making it difficult to compare recent OOD detection methods. To address this challenge, we introduce three novel OOD detection benchmarks that enable a deeper understanding of method characteristics and reflect real-world conditions. First, we present ImageNet-X, designed to evaluate performance under challenging semantic shifts. Second, we propose ImageNet-FS-X for full-spectrum OOD detection, assessing robustness to covariate shifts (feature distribution shifts). Finally, we propose Wilds-FS-X, which extends these evaluations to real-world datasets, offering a more comprehensive testbed. Our experiments reveal that recent CLIP-based OOD detection methods struggle to varying degrees across the three proposed benchmarks, and none of them consistently outperforms the others. We hope the community goes beyond specific benchmarks and includes more challenging conditions reflecting real-world scenarios. The code is https://github.com/hoshi23/OOD-X-Benchmarks.

Community

Paper author

This paper proposes novel benchmarks for OOD detection, ImageNet-X, ImageNet-FS-X, Wilds-FS-X.
We hope the community goes beyond specific benchmarks and includes more challenging conditions reflecting real-world scenarios.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.18463 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.18463 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.18463 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.