arxiv:2503.01334

Composed Multi-modal Retrieval: A Survey of Approaches and Applications

Published on Mar 3

Authors:

Abstract

Composed Multi-modal Retrieval integrates visual and textual data to enhance search capabilities, addressing challenges in supervised, zero-shot, and semi-supervised learning across various applications.

AI-generated summary

With the rapid growth of multi-modal data from social media, short video platforms, and e-commerce, content-based retrieval has become essential for efficiently searching and utilizing heterogeneous information. Over time, retrieval techniques have evolved from Unimodal Retrieval (UR) to Cross-modal Retrieval (CR) and, more recently, to Composed Multi-modal Retrieval (CMR). CMR enables users to retrieve images or videos by integrating a reference visual input with textual modifications, enhancing search flexibility and precision. This paper provides a comprehensive review of CMR, covering its fundamental challenges, technical advancements, and categorization into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data augmentation, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and external knowledge integration in zero-shot CMR. Additionally, we highlight the application potential of CMR in composed image retrieval, video retrieval, and person retrieval, which have significant implications for e-commerce, online search, and public security. Given its ability to refine and personalize search experiences, CMR is poised to become a pivotal technology in next-generation retrieval systems. A curated list of related works and resources is available at: https://github.com/kkzhang95/Awesome-Composed-Multi-modal-Retrieval

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.01334 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.01334 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.01334 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.