InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 5 days ago • 223
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 10 days ago • 9
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published 23 days ago • 31
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published Dec 26, 2024 • 18
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 26