UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published 11 days ago • 58
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published 12 days ago • 12
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 28
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 128
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published Apr 21 • 23
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published Mar 23 • 19
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 86
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 124
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9 • 30
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9 • 11
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 62
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published Mar 4 • 25
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence Paper • 2502.13943 • Published Feb 19 • 8