Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 2 days ago • 20
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 1 day ago • 40
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 3 days ago • 26
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training Paper • 2501.07556 • Published 5 days ago • 5
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published 4 days ago • 10
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published 4 days ago • 16
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published 4 days ago • 24
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper • 2501.06751 • Published 6 days ago • 31
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 4 days ago • 258
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 5 days ago • 45
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 7 days ago • 29
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published 8 days ago • 27
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 8 days ago • 55
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 11 days ago • 48
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 10 days ago • 33
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published 10 days ago • 48