Abstract
Seedance 1.0 offers high-performance video generation by integrating advanced data curation, efficient architecture, post-training optimization, and model acceleration, resulting in superior quality and speed.
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm, which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning, and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation.
Community
Seedance 1.0 Technical Report
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ContentV: Efficient Training of Video Generation Models with Limited Compute (2025)
- SkyReels-V2: Infinite-length Film Generative Model (2025)
- Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios (2025)
- Seedream 3.0 Technical Report (2025)
- HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer (2025)
- DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution (2025)
- Training-Free Efficient Video Generation via Dynamic Token Carving (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper