Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 16
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 21
On the Limitations of Vision-Language Models in Understanding Image Transforms Paper • 2503.09837 • Published Mar 12 • 10
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 33
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published 28 days ago • 71
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration Paper • 2503.12821 • Published Mar 17 • 9
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 7 days ago • 21