Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Paper • 2412.01268 • Published Dec 2, 2024 • 1
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 52
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning Paper • 2508.04416 • Published Aug 6 • 1
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams Paper • 2506.23825 • Published Jun 30
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction Paper • 2412.04887 • Published Dec 6, 2024 • 18
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams Paper • 2406.08085 • Published Jun 12, 2024 • 17