Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15 • 2
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Paper • 2506.12776 • Published Jun 15 • 2
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published Jun 1 • 9
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27 • 39
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27 • 39
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published May 24 • 65
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published Apr 13 • 16
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 31
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30