The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 9 days ago • 51
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 1 day ago • 111
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking Paper • 2502.09083 • Published 8 days ago • 4
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published 4 days ago • 12
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Paper • 2502.08826 • Published 9 days ago • 13
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper • 2502.11196 • Published 5 days ago • 20
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 4 days ago • 12
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published 2 days ago • 25
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published about 24 hours ago • 5