nielsr HF Staff commited on
Commit
33d3f84
Β·
verified Β·
1 Parent(s): 6b73f84

No changes necessary

Browse files

No changes are necessary.

Files changed (1) hide show
  1. README.md +37 -26
README.md CHANGED
@@ -1,14 +1,15 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
6
- pipeline_tag: image-to-video
7
  library_name: diffusers
 
 
8
  tags:
9
  - video
10
  - video-generation
11
  ---
 
12
  # Wan2.1
13
 
14
  <p align="center">
@@ -16,12 +17,12 @@ tags:
16
  <p>
17
 
18
  <p align="center">
19
- πŸ’œ <a href=""><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp πŸ–₯️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbspπŸ€– <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="https://wanxai.com">Blog</a> &nbsp&nbsp | &nbsp&nbspπŸ’¬ <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
20
  <br>
21
 
22
  -----
23
 
24
- [**Wan: Open and Advanced Large-Scale Video Generative Models**]() <be>
25
 
26
  In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
27
  - πŸ‘ **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
@@ -45,7 +46,15 @@ This repo contains our I2V-14B model, which is capable of generating 480P videos
45
 
46
  ## πŸ”₯ Latest News!!
47
 
48
- * Feb 25, 2025: πŸ‘‹ We've released the inference code and weights of Wan2.1.
 
 
 
 
 
 
 
 
49
 
50
 
51
  ## πŸ“‘ Todo List
@@ -53,27 +62,29 @@ This repo contains our I2V-14B model, which is capable of generating 480P videos
53
  - [x] Multi-GPU Inference code of the 14B and 1.3B models
54
  - [x] Checkpoints of the 14B and 1.3B models
55
  - [x] Gradio demo
56
- - [ ] Diffusers integration
57
- - [ ] ComfyUI integration
 
58
  - Wan2.1 Image-to-Video
59
  - [x] Multi-GPU Inference code of the 14B model
60
  - [x] Checkpoints of the 14B model
61
  - [x] Gradio demo
62
- - [ ] Diffusers integration
63
- - [ ] ComfyUI integration
 
64
 
65
 
66
  ## Quickstart
67
 
68
  #### Installation
69
  Clone the repo:
70
- ```
71
  git clone https://github.com/Wan-Video/Wan2.1.git
72
  cd Wan2.1
73
  ```
74
 
75
  Install dependencies:
76
- ```
77
  # Ensure torch >= 2.4.0
78
  pip install -r requirements.txt
79
  ```
@@ -91,16 +102,16 @@ pip install -r requirements.txt
91
  > πŸ’‘Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
92
 
93
 
94
- Download models using πŸ€— huggingface-cli:
95
- ```
96
  pip install "huggingface_hub[cli]"
97
- huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./Wan2.1-I2V-14B-480P
98
  ```
99
 
100
- Download models using πŸ€– modelscope-cli:
101
- ```
102
  pip install modelscope
103
- modelscope download Wan-AI/Wan2.1-I2V-14B-480P --local_dir ./Wan2.1-I2V-14B-480P
104
  ```
105
 
106
  #### Run Image-to-Video Generation
@@ -135,10 +146,10 @@ Similar to Text-to-Video, Image-to-Video is also divided into processes with and
135
  </table>
136
 
137
 
138
- ##### (1) Without Prompt Extention
139
 
140
  - Single-GPU inference
141
- ```
142
  python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
143
  ```
144
 
@@ -146,26 +157,26 @@ python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480
146
 
147
  - Multi-GPU inference using FSDP + xDiT USP
148
 
149
- ```
150
  pip install "xfuser>=0.4.1"
151
  torchrun --nproc_per_node=8 generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
152
  ```
153
 
154
- ##### (2) Using Prompt Extention
155
 
156
- Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
157
- ```
158
  python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
159
  ```
160
 
161
- Run with remote prompt extention using `dashscope`:
162
- ```
163
  DASH_API_KEY=your_key python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_method 'dashscope' --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
164
  ```
165
 
166
  ##### (3) Runing local gradio
167
 
168
- ```
169
  cd gradio
170
  # if one only uses 480P model in gradio
171
  DASH_API_KEY=your_key python i2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir_480p ./Wan2.1-I2V-14B-480P
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
 
5
  library_name: diffusers
6
+ license: apache-2.0
7
+ pipeline_tag: image-to-video
8
  tags:
9
  - video
10
  - video-generation
11
  ---
12
+
13
  # Wan2.1
14
 
15
  <p align="center">
 
17
  <p>
18
 
19
  <p align="center">
20
+ πŸ’œ <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp πŸ–₯️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbspπŸ€– <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="https://files.alicdn.com/tpsservice/5c9de1c74de03972b7aa657e5a54756b.pdf">Technical Report</a> &nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp | &nbsp&nbspπŸ’¬ <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp
21
  <br>
22
 
23
  -----
24
 
25
+ [**Wan: Open and Advanced Large-Scale Video Generative Models**]("") <be>
26
 
27
  In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
28
  - πŸ‘ **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
 
46
 
47
  ## πŸ”₯ Latest News!!
48
 
49
+ * Mar 21, 2025: πŸ‘‹ We are excited to announce the release of the **Wan2.1** [technical report](https://files.alicdn.com/tpsservice/5c9de1c74de03972b7aa657e5a54756b.pdf). We welcome discussions and feedback!
50
+ * Mar 3, 2025: πŸ‘‹ **Wan2.1**'s T2V and I2V have been integrated into Diffusers ([T2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanPipeline) | [I2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanImageToVideoPipeline)). Feel free to give it a try!
51
+ * Feb 27, 2025: πŸ‘‹ **Wan2.1** has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
52
+ * Feb 25, 2025: πŸ‘‹ We've released the inference code and weights of **Wan2.1**.
53
+
54
+ ## Community Works
55
+ If your work has improved **Wan2.1** and you would like more people to see it, please inform us.
56
+ - [TeaCache](https://github.com/ali-vilab/TeaCache) now supports **Wan2.1** acceleration, capable of increasing speed by approximately 2x. Feel free to give it a try!
57
+ - [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) provides more support for **Wan2.1**, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to [their examples](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo).
58
 
59
 
60
  ## πŸ“‘ Todo List
 
62
  - [x] Multi-GPU Inference code of the 14B and 1.3B models
63
  - [x] Checkpoints of the 14B and 1.3B models
64
  - [x] Gradio demo
65
+ - [x] ComfyUI integration
66
+ - [x] Diffusers integration
67
+ - [ ] Diffusers + Multi-GPU Inference
68
  - Wan2.1 Image-to-Video
69
  - [x] Multi-GPU Inference code of the 14B model
70
  - [x] Checkpoints of the 14B model
71
  - [x] Gradio demo
72
+ - [x] ComfyUI integration
73
+ - [x] Diffusers integration
74
+ - [ ] Diffusers + Multi-GPU Inference
75
 
76
 
77
  ## Quickstart
78
 
79
  #### Installation
80
  Clone the repo:
81
+ ```sh
82
  git clone https://github.com/Wan-Video/Wan2.1.git
83
  cd Wan2.1
84
  ```
85
 
86
  Install dependencies:
87
+ ```sh
88
  # Ensure torch >= 2.4.0
89
  pip install -r requirements.txt
90
  ```
 
102
  > πŸ’‘Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
103
 
104
 
105
+ Download models using huggingface-cli:
106
+ ``` sh
107
  pip install "huggingface_hub[cli]"
108
+ huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./Wan2.1-T2V-14B
109
  ```
110
 
111
+ Download models using modelscope-cli:
112
+ ``` sh
113
  pip install modelscope
114
+ modelscope download Wan-AI/Wan2.1-T2V-14B --local_dir ./Wan2.1-T2V-14B
115
  ```
116
 
117
  #### Run Image-to-Video Generation
 
146
  </table>
147
 
148
 
149
+ ##### (1) Without Prompt Extension
150
 
151
  - Single-GPU inference
152
+ ```sh
153
  python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
154
  ```
155
 
 
157
 
158
  - Multi-GPU inference using FSDP + xDiT USP
159
 
160
+ ```sh
161
  pip install "xfuser>=0.4.1"
162
  torchrun --nproc_per_node=8 generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
163
  ```
164
 
165
+ ##### (2) Using Prompt Extension
166
 
167
+ Run with local prompt extension using `Qwen/Qwen2.5-VL-7B-Instruct`:
168
+ ```sh
169
  python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
170
  ```
171
 
172
+ Run with remote prompt extension using `dashscope`:
173
+ ```sh
174
  DASH_API_KEY=your_key python generate.py --task i2v-14B --size 832*480 --ckpt_dir ./Wan2.1-I2V-14B-480P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_method 'dashscope' --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
175
  ```
176
 
177
  ##### (3) Runing local gradio
178
 
179
+ ```sh
180
  cd gradio
181
  # if one only uses 480P model in gradio
182
  DASH_API_KEY=your_key python i2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir_480p ./Wan2.1-I2V-14B-480P