PISCO: Precise Video Instance Insertion with Sparse Control Paper • 2602.08277 • Published 6 days ago • 10
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling Paper • 2602.09084 • Published 5 days ago • 26
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published Jan 4 • 19
MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding Paper • 2507.12463 • Published Jul 16, 2025 • 27
SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Paper • 2506.07564 • Published Jun 9, 2025 • 6
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation Paper • 2505.24073 • Published May 29, 2025
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution Paper • 2505.00687 • Published May 1, 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation Paper • 2505.24073 • Published May 29, 2025
Demystifying the Visual Quality Paradox in Multimodal Large Language Models Paper • 2506.15645 • Published Jun 18, 2025 • 4
Demystifying the Visual Quality Paradox in Multimodal Large Language Models Paper • 2506.15645 • Published Jun 18, 2025 • 4
SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Paper • 2506.07564 • Published Jun 9, 2025 • 6
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2, 2025 • 48
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1, 2024 • 22
Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models Paper • 2410.03659 • Published Oct 4, 2024 • 5
AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Paper • 2404.16205 • Published Apr 24, 2024