Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 19
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 21
On the Limitations of Vision-Language Models in Understanding Image Transforms Paper • 2503.09837 • Published Mar 12 • 10
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 36
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20 • 73
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration Paper • 2503.12821 • Published Mar 17 • 9
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 29
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Paper • 2505.14071 • Published May 20 • 1
To Trust Or Not To Trust Your Vision-Language Model's Prediction Paper • 2505.23745 • Published May 29 • 5
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning Paper • 2506.04755 • Published Jun 5 • 37
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 35
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 35
Robust Multimodal Large Language Models Against Modality Conflict Paper • 2507.07151 • Published Jul 9 • 5
Automating Steering for Safe Multimodal Large Language Models Paper • 2507.13255 • Published 29 days ago • 3
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Paper • 2507.20198 • Published 20 days ago • 25
Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published 8 days ago • 10
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Paper • 2508.04280 • Published 10 days ago • 34