Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published May 29 • 23
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models Paper • 2505.21523 • Published May 23 • 14
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published May 30 • 34
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 41
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16 • 91
OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published about 1 month ago • 73
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published Jun 20 • 27
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published Jun 20 • 10
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published 28 days ago • 15
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 23 days ago • 193
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published 17 days ago • 16
Robust Multimodal Large Language Models Against Modality Conflict Paper • 2507.07151 • Published 15 days ago • 5
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published 15 days ago • 43
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers Paper • 2507.10787 • Published 9 days ago • 11