DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks Paper • 2509.01396 • Published 8 days ago • 51
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published 7 days ago • 28
Mobile-Agent-v3: Foundamental Agents for GUI Automation Paper • 2508.15144 • Published 19 days ago • 61
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published 14 days ago • 182
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published 13 days ago • 39
VertexRegen: Mesh Generation with Continuous Level of Detail Paper • 2508.09062 • Published 28 days ago • 35
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension Paper • 2508.01959 • Published Aug 3 • 57
ForCenNet: Foreground-Centric Network for Document Image Rectification Paper • 2507.19804 • Published Jul 26 • 11
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 35
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 18
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published May 5 • 28
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published Apr 27 • 14
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47