CaughtCheating: Is Your MLLM a Good Cheating Detective? Exploring the Boundary of Visual Perception and Reasoning Paper • 2507.00045 • Published Jun 23
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding Paper • 2508.07493 • Published 29 days ago • 8
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning Paper • 2507.22887 • Published Jul 30
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper • 2507.07996 • Published Jul 10 • 33
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models Paper • 2505.21765 • Published May 27
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation Paper • 2506.10395 • Published Jun 12
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Paper • 2506.20911 • Published Jun 26 • 41
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper • 2506.08343 • Published Jun 10 • 50
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Paper • 2506.06998 • Published Jun 8
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs Paper • 2504.20406 • Published Apr 29 • 8
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Paper • 2505.01481 • Published May 2 • 3
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 97
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published Apr 22 • 20
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents Paper • 2504.11571 • Published Apr 15 • 1
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Paper • 2410.21259 • Published Oct 28, 2024 • 1
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10 • 47