OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published 5 days ago • 67
MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Paper • 2502.12558 • Published Feb 18
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published Feb 10 • 13
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 37
Efficient Multimodal Learning from Data-centric Perspective Paper • 2402.11530 • Published Feb 18, 2024 • 1
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11, 2024 • 19