Submitted by xhyandwyy 38 Mobile-Agent-v3: Foundamental Agents for GUI Automation · 15 authors 4.73k 2
Submitted by Kevin355 23 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries · 14 authors 2
Submitted by haoningwu 10 SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass · 4 authors 22 1
Submitted by cai-qi 5 Visual Autoregressive Modeling for Instruction-Guided Image Editing · 8 authors 9 1
Submitted by taesiri 5 ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling · 10 authors 1
Submitted by universea 4 aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists · 23 authors 1
Submitted by taesiri 3 "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries · 10 authors 1
Submitted by taesiri 2 When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding · 3 authors 1
Submitted by amazingj 2 Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models · 7 authors 1
Submitted by thewhole 2 Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds · 9 authors 1
Submitted by YirongSun - LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model · 8 authors 12 1