arxiv:2507.07015

MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

Published on Jul 9

· Submitted by

Gray1y on Jul 17

Upvote

Authors:

Hui Li ,

Abstract

MST-Distill, a novel cross-modal knowledge distillation framework, uses a mixture of specialized teachers and an instance-level routing network to address distillation path selection and knowledge drift, outperforming existing methods across multimodal datasets.

AI-generated summary

Knowledge distillation as an efficient knowledge transfer technique, has achieved remarkable success in unimodal scenarios. However, in cross-modal settings, conventional distillation methods encounter significant challenges due to data and statistical heterogeneities, failing to leverage the complementary prior knowledge embedded in cross-modal teacher models. This paper empirically reveals two critical issues in existing approaches: distillation path selection and knowledge drift. To address these limitations, we propose MST-Distill, a novel cross-modal knowledge distillation framework featuring a mixture of specialized teachers. Our approach employs a diverse ensemble of teacher models across both cross-modal and multimodal configurations, integrated with an instance-level routing network that facilitates adaptive and dynamic distillation. This architecture effectively transcends the constraints of traditional methods that rely on monotonous and static teacher models. Additionally, we introduce a plug-in masking module, independently trained to suppress modality-specific discrepancies and reconstruct teacher representations, thereby mitigating knowledge drift and enhancing transfer effectiveness. Extensive experiments across five diverse multimodal datasets, spanning visual, audio, and text, demonstrate that our method significantly outperforms existing state-of-the-art knowledge distillation methods in cross-modal distillation tasks. The source code is available at https://github.com/Gray-OREO/MST-Distill.

View arXiv page View PDF GitHub 12 Add to collection

Community

Gray1y

Paper author Paper submitter about 14 hours ago

We are excited to share our latest work on cross-modal knowledge distillation:

Paper Title: MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

arXiv Link: https://arxiv.org/abs/2507.07015

Acceptance Status: Accepted by ACM MM 2025 ✅

Key Contributions:

Proposed MST-Distill framework with a novel mixture of specialized teachers for cross-modal knowledge distillation
Introduced instance-level routing network for adaptive and dynamic distillation
Designed a plug-in masking module to mitigate knowledge drift
Significantly outperformed existing state-of-the-art methods on 5 multimodal datasets

Code: https://github.com/Gray-OREO/MST-Distill

We believe this work will be valuable to the cross-modal learning community!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.07015 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.07015 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.