arxiv:2503.23733

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Published on Mar 31

· Submitted by

carboncoo on Apr 2

Upvote

Authors:

Chi Chen ,

Abstract

Recently, model merging methods have demonstrated powerful strengths in combining abilities on various tasks from multiple Large Language Models (LLMs). While previous model merging methods mainly focus on merging homogeneous models with identical architecture, they meet challenges when dealing with Multimodal Large Language Models (MLLMs) with inherent heterogeneous property, including differences in model architecture and the asymmetry in the parameter space. In this work, we propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture. Then we apply linear interpolation on model weights to actively adapt the asymmetry in the heterogeneous MLLMs. Finally in the hyper-parameter searching step, we propose an unsupervised hyper-parameter selection method for model merging. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, extensive experiments on various model combinations demonstrated that AdaMMS outperforms previous model merging methods on various vision-language benchmarks.

View arXiv page View PDF Add to collection

Community

carboncoo

Paper author Paper submitter 14 days ago

Paper: https://arxiv.org/abs/2503.23733
Code: https://github.com/THUNLP-MT/AdaMMS

Recent advancements in model merging have shown great potential in combining capabilities from multiple large language models (LLMs). However, existing methods primarily focus on merging homogeneous models with identical architectures, struggling when applied to heterogeneous Multimodal Large Language Models (MLLMs) that differ in both architecture and parameter space.

We propose AdaMMS: Adaptive Mapping, Merging, and Searching — a novel unsupervised model merging framework tailored for heterogeneous MLLMs. AdaMMS tackles the challenges in three steps:

🧠 Mapping
Establish a mapping function between different model architectures.

⚖️ Merging
Perform weighted linear interpolation to accommodate asymmetries in parameter space.

🔍 Searching
Introduce an unsupervised hyperparameter search method to determine optimal merging coefficients.

📊 Extensive experiments show that AdaMMS consistently outperforms previous model merging methods on various vision-language benchmarks.

Here is the illustration of three steps in AdaMMS:

modricwang

14 days ago

Novel technique! 👍

librarian-bot

13 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.23733 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.23733 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.23733 in a Space README.md to link it from this page.