Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 7 days ago • 21
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 10 days ago • 84
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 9 days ago • 79
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 17 days ago • 89
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 17 days ago • 49
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 23 days ago • 85
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 8 days ago • 75
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 270
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 20 days ago • 42
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 16 days ago • 271
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Paper • 2411.13093 • Published Nov 20, 2024 • 1
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 20 days ago • 59
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 20 days ago • 66
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 27 days ago • 42
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 27 days ago • 31
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 23 days ago • 48
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 23 days ago • 67
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 21 days ago • 84
Enhancing Human-Like Responses in Large Language Models Paper • 2501.05032 • Published 21 days ago • 49
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 22 days ago • 90