aejion commited on
Commit
dd5b2b6
Β·
verified Β·
1 Parent(s): 5b658f7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
2
+
3
+ This repository is the official PyTorch implementation of [AccVideo](https://arxiv.org/abs/2503.19462). AccVideo is a novel efficient distillation method to accelerate video diffusion models with synthetic datset. Our method is 8.5x faster than HunyuanVideo.
4
+
5
+
6
+ [![arXiv](https://img.shields.io/badge/arXiv-2503.19462-b31b1b.svg)](https://arxiv.org/abs/2503.19462)
7
+ [![Project Page](https://img.shields.io/badge/Project-Website-green)](https://aejion.github.io/accvideo/)
8
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/aejion/AccVideo)
9
+
10
+ ## πŸ”₯πŸ”₯πŸ”₯ News
11
+
12
+ * May 26, 2025: We release the inference code and [model weights](https://huggingface.co/aejion/AccVideo-WanX-T2V-14B) of AccVideo based on WanXT2V-14B.
13
+ * Mar 31, 2025: [ComfyUI-Kijai (FP8 Inference)](https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/accvideo-t2v-5-steps_fp8_e4m3fn.safetensors): ComfyUI-Integration by [Kijai](https://huggingface.co/Kijai)
14
+ * Mar 26, 2025: We release the inference code and [model weights](https://huggingface.co/aejion/AccVideo) of AccVideo based on HunyuanT2V.
15
+
16
+
17
+ ## πŸŽ₯ Demo (Based on HunyuanT2V)
18
+
19
+
20
+ https://github.com/user-attachments/assets/59f3c5db-d585-4773-8d92-366c1eb040f0
21
+
22
+ ## πŸŽ₯ Demo (Based on WanXT2V-14B)
23
+
24
+
25
+
26
+ ## πŸ“‘ Open-source Plan
27
+
28
+ - [x] Inference
29
+ - [x] Checkpoints
30
+ - [ ] Multi-GPU Inference
31
+ - [ ] Synthetic Video Dataset, SynVid
32
+ - [ ] Training
33
+
34
+
35
+ ## πŸ”§ Installation
36
+ The code is tested on Python 3.10.0, CUDA 11.8 and A100.
37
+ ```
38
+ conda create -n accvideo python==3.10.0
39
+ conda activate accvideo
40
+
41
+ pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
42
+ pip install -r requirements.txt
43
+ pip install flash-attn==2.7.3 --no-build-isolation
44
+ pip install "huggingface_hub[cli]"
45
+ ```
46
+
47
+ ## πŸ€— Checkpoints
48
+ To download the checkpoints (based on HunyuanT2V), use the following command:
49
+ ```bash
50
+ # Download the model weight
51
+ huggingface-cli download aejion/AccVideo --local-dir ./ckpts
52
+ ```
53
+
54
+ To download the checkpoints (based on WanX-T2V-14B), use the following command:
55
+ ```bash
56
+ # Download the model weight
57
+ huggingface-cli download aejion/AccVideo-WanX-T2V-14B --local-dir ./wanx_t2v_ckpts
58
+ ```
59
+
60
+ ## πŸš€ Inference
61
+ We recommend using a GPU with 80GB of memory. We use AccVideo to distill Hunyuan and WanX.
62
+
63
+ ### Inference for HunyuanT2V
64
+
65
+ To run the inference, use the following command:
66
+ ```bash
67
+ export MODEL_BASE=./ckpts
68
+ python sample_t2v.py \
69
+ --height 544 \
70
+ --width 960 \
71
+ --num_frames 93 \
72
+ --num_inference_steps 5 \
73
+ --guidance_scale 1 \
74
+ --embedded_cfg_scale 6 \
75
+ --flow_shift 7 \
76
+ --flow-reverse \
77
+ --prompt_file ./assets/prompt.txt \
78
+ --seed 1024 \
79
+ --output_path ./results/accvideo-544p \
80
+ --model_path ./ckpts \
81
+ --dit-weight ./ckpts/accvideo-t2v-5-steps/diffusion_pytorch_model.pt
82
+ ```
83
+
84
+ The following table shows the comparisons on inference time using a single A100 GPU:
85
+
86
+ | Model | Setting(height/width/frame) | Inference Time(s) |
87
+ |:------------:|:---------------------------:|:-----------------:|
88
+ | HunyuanVideo | 720px1280px129f | 3234 |
89
+ | Ours | 720px1280px129f | 380(8.5x faster) |
90
+ | HunyuanVideo | 544px960px93f | 704 |
91
+ | Ours | 544px960px93f | 91(7.7x faster) |
92
+
93
+ ### Inference for WanXT2V
94
+
95
+ To run the inference, use the following command:
96
+ ```bash
97
+ python sample_wanx_t2v.py \
98
+ --task t2v-14B \
99
+ --size 832*480 \
100
+ --ckpt_dir ./wanx_t2v_ckpts \
101
+ --sample_solver 'unipc' \
102
+ --save_dir ./results/accvideo_wanx_14B \
103
+ --sample_steps 10
104
+ ```
105
+
106
+ The following table shows the comparisons on inference time using a single A100 GPU:
107
+
108
+ | Model | Setting(height/width/frame) | Inference Time(s) |
109
+ |:-----:|:---------------------------:|:-----------------:|
110
+ | Wanx | 480px832px81f | 932 |
111
+ | Ours | 480px832px81f | 97(9.6x faster) |
112
+
113
+ ## πŸ”— BibTeX
114
+
115
+ If you find [AccVideo](https://arxiv.org/abs/2503.19462) useful for your research and applications, please cite using this BibTeX:
116
+
117
+ ```BibTeX
118
+ @article{zhang2025accvideo,
119
+ title={AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset},
120
+ author={Zhang, Haiyu and Chen, Xinyuan and Wang, Yaohui and Liu, Xihui and Wang, Yunhong and Qiao, Yu},
121
+ journal={arXiv preprint arXiv:2503.19462},
122
+ year={2025}
123
+ }
124
+ ```
125
+
126
+ ## Acknowledgements
127
+ The code is built upon [FastVideo](https://github.com/hao-ai-lab/FastVideo) and [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we thank all the contributors for open-sourcing.