Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding Paper • 2505.18079 • Published 22 days ago • 4 • 2
Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators Paper • 2306.01242 • Published Jun 2, 2023 • 2
Unifying Layout Generation with a Decoupled Diffusion Model Paper • 2303.05049 • Published Mar 9, 2023
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences Paper • 2105.11941 • Published May 25, 2021
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API Paper • 2310.04716 • Published Oct 7, 2023