AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
Abstract
Recently, model merging methods have demonstrated powerful strengths in combining abilities on various tasks from multiple Large Language Models (LLMs). While previous model merging methods mainly focus on merging homogeneous models with identical architecture, they meet challenges when dealing with Multimodal Large Language Models (MLLMs) with inherent heterogeneous property, including differences in model architecture and the asymmetry in the parameter space. In this work, we propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture. Then we apply linear interpolation on model weights to actively adapt the asymmetry in the heterogeneous MLLMs. Finally in the hyper-parameter searching step, we propose an unsupervised hyper-parameter selection method for model merging. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, extensive experiments on various model combinations demonstrated that AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
Community
Paper: https://arxiv.org/abs/2503.23733
Code: https://github.com/THUNLP-MT/AdaMMS
Recent advancements in model merging have shown great potential in combining capabilities from multiple large language models (LLMs). However, existing methods primarily focus on merging homogeneous models with identical architectures, struggling when applied to heterogeneous Multimodal Large Language Models (MLLMs) that differ in both architecture and parameter space.
We propose AdaMMS: Adaptive Mapping, Merging, and Searching โ a novel unsupervised model merging framework tailored for heterogeneous MLLMs. AdaMMS tackles the challenges in three steps:
๐ง Mapping
Establish a mapping function between different model architectures.
โ๏ธ Merging
Perform weighted linear interpolation to accommodate asymmetries in parameter space.
๐ Searching
Introduce an unsupervised hyperparameter search method to determine optimal merging coefficients.
๐ Extensive experiments show that AdaMMS consistently outperforms previous model merging methods on various vision-language benchmarks.
Novel technique! ๐
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models (2025)
- 1bit-Merging: Dynamic Quantized Merging for Large Language Models (2025)
- Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation (2025)
- Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation (2025)
- Scalable Model Merging with Progressive Layer-wise Distillation (2025)
- Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging (2025)
- LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper