Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published 14 days ago • 9
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published 19 days ago • 39
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published 19 days ago • 39
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 22 days ago • 63
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published Apr 13 • 16
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 31
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Paper • 2503.13435 • Published Mar 17 • 17
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Paper • 2503.13435 • Published Mar 17 • 17