VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published 6 days ago • 37
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 1 day ago • 185
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 6 days ago • 20
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 8 days ago • 77
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Paper • 2504.03151 • Published 12 days ago • 12
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 9 days ago • 158
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published 13 days ago • 27
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 13 days ago • 30
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published 14 days ago • 22
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Paper • 2503.23733 • Published 16 days ago • 11
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 15 days ago • 34
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 16 days ago • 37
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper • 2503.23730 • Published 16 days ago • 4
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published 20 days ago • 58
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 20 days ago • 21
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published 20 days ago • 31
ViLBench: A Suite for Vision-Language Process Reward Modeling Paper • 2503.20271 • Published 21 days ago • 7