Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 22 • 3
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Paper • 2501.13920 • Published Jan 23 • 17 • 2
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published Jan 16 • 49 • 2
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper • 2501.09012 • Published Jan 15 • 10 • 2
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published Jan 15 • 32 • 2
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Paper • 2501.09019 • Published Jan 15 • 12 • 2
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published Jan 14 • 17 • 2
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 55 • 3