AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation Paper • 2506.10540 • Published Jun 12 • 37
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper • 2506.05010 • Published Jun 5 • 72
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25 • 43
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting Paper • 2505.18822 • Published May 24 • 14
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Paper • 2505.04410 • Published May 7 • 45
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 180
VideoVista-CulturalLingo: 360^circ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension Paper • 2504.17821 • Published Apr 23 • 23
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published Apr 22 • 64
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering Paper • 2311.07536 • Published Nov 13, 2023 • 3
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning Paper • 2406.11303 • Published Jun 17, 2024 • 3
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Paper • 2408.13239 • Published Aug 23, 2024 • 12
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Paper • 2408.09787 • Published Aug 19, 2024 • 10