VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 32
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published 17 days ago • 30
chancharikm/qwen2.5-vl-7b-cam-motion-preview Video-Text-to-Text • 8B • Updated May 2 • 2.4k • 11
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7
Runtime error 281 281 Thera Arbitrary-Scale Super-Resolution 🔥 Enhance image quality with real-time super-resolution