Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 26 days ago • 128
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Paper • 2509.20360 • Published Sep 24, 2025 • 17
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation Paper • 2307.01831 • Published Jul 4, 2023 • 8
Mask-Attention-Free Transformer for 3D Instance Segmentation Paper • 2309.01692 • Published Sep 4, 2023 • 1
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17, 2025 • 41
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22, 2025 • 24
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26, 2025 • 56
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16, 2025 • 35
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging Paper • 2503.02783 • Published Mar 4, 2025 • 6
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper • 2310.01506 • Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024 • 2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
Multi-modal Cooking Workflow Construction for Food Recipes Paper • 2008.09151 • Published Aug 20, 2020 • 1
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Paper • 2501.03931 • Published Jan 7, 2025 • 15
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 48