LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published 6 days ago • 32 • 3
ViSpeak: Visual Instruction Feedback in Streaming Videos Paper • 2503.12769 • Published Mar 17 • 8 • 2