LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published 18 days ago • 35
HumanMM: Global Human Motion Recovery from Multi-shot Videos Paper • 2503.07597 • Published Mar 10 • 2
HumanMM: Global Human Motion Recovery from Multi-shot Videos Paper • 2503.07597 • Published Mar 10 • 2 • 1
view article Article MotionLCM-V2: Improved Compression Rate for Multi-Latent-Token Diffusion By wxDai • Dec 11, 2024 • 17
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21, 2024 • 15 • 3
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms Paper • 2410.18977 • Published Oct 24, 2024 • 15
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms Paper • 2410.18977 • Published Oct 24, 2024 • 15
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms Paper • 2410.18977 • Published Oct 24, 2024 • 15 • 2