AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

This repository contains the pre-trained weights of AccVideo. AccVideo is a novel efficient distillation method to accelerate video diffusion models with synthetic datset. Our method is 8.5x faster than HunyuanVideo.

arXivProject Page

πŸ”₯πŸ”₯πŸ”₯ News

  • Mar, 2025: We release the inference code and model weights of AccVideo.

πŸ“‘ Open-source Plan

  • Inference
  • Checkpoints
  • Multi-GPU Inference
  • Synthetic Video Dataset, SynVid
  • Training

πŸ”§ Installation

The code is tested on Python 3.10.0, CUDA 11.8 and A100.

conda create -n accvideo python==3.10.0
conda activate accvideo

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
pip install "huggingface_hub[cli]"

πŸ€— Checkpoints

To download the checkpoints, use the following command:

# Download the model weight
huggingface-cli download aejion/AccVideo --local-dir ./ckpts

πŸš€ Inference

We recommend using a GPU with 80GB of memory. To run the inference, use the following command:

export MODEL_BASE=./ckpts
python sample_t2v.py \
    --height 544 \
    --width 960 \
    --num_frames 93 \
    --num_inference_steps 50 \
    --guidance_scale 1 \
    --embedded_cfg_scale 6 \
    --flow_shift 7 \
    --flow-reverse \
    --prompt_file ./assets/prompt.txt \
    --seed 1024 \
    --output_path ./results/accvideo-544p \
    --model_path ./ckpts \
    --dit-weight ./ckpts/accvideo-t2v-5-steps/diffusion_pytorch_model.pt

The following table shows the comparisons on inference time using a single A100 GPU:

Model Setting(height/width/frame) Inference Time(s)
HunyuanVideo 720px1280px129f 3234
Ours 720px1280px129f 380(8.5x faster)
HunyuanVideo 544px960px93f 704
Ours 544px960px93f 91(7.7x faster)

πŸ”— BibTeX

If you find AccVideo useful for your research and applications, please cite using this BibTeX:

@article{zhang2025accvideo,
    title={AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset},
    author={Zhang, Haiyu and Chen, Xinyuan and Wang, Yaohui and Liu, Xihui and Wang, Yunhong and Qiao, Yu},
    journal={arXiv preprint arXiv:2503.19462},
    year={2025}
}

Acknowledgements

The code is built upon FastVideo and HunyuanVideo, we thank all the contributors for open-sourcing.

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support