Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper • 2501.09012 • Published 3 days ago • 9
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 3 days ago • 25
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 1 day ago • 15
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 1 day ago • 21
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 1 day ago • 12
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published 1 day ago • 29
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 1 day ago • 39
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 7 days ago • 29
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 5 days ago • 72
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published 4 days ago • 24
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 4 days ago • 258
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 8 days ago • 32
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper • 2501.03841 • Published 11 days ago • 49
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 8 days ago • 55
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published 11 days ago • 23
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 11 days ago • 48
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 15 days ago • 31
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 15 days ago • 40