Add `library_name: diffusers` and comprehensive usage instructions

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,9 +1,19 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Wan-AI/Wan2.1-T2V-14B
 
5
  pipeline_tag: text-to-video
 
6
  ---
 
 
 
 
 
 
 
 
 
7
  <div align="center">
8
 
9
  <h1>
@@ -45,9 +55,90 @@ pipeline_tag: text-to-video
45
 
46
  ## πŸš€ Quick Start
47
 
48
- Please see [Github](https://github.com/WeChatCV/Wan-Alpha) for code running details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## 🀝 Acknowledgements
53
 
@@ -81,4 +172,4 @@ If you find our work helpful for your research, please consider citing our paper
81
 
82
  ## πŸ“¬ Contact Us
83
 
84
- If you have any questions or suggestions, feel free to reach out via [GitHub Issues](https://github.com/WeChatCV/Wan-Alpha/issues) . We look forward to your feedback!
 
1
  ---
 
2
  base_model:
3
  - Wan-AI/Wan2.1-T2V-14B
4
+ license: apache-2.0
5
  pipeline_tag: text-to-video
6
+ library_name: diffusers
7
  ---
8
+
9
+ # Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel
10
+
11
+ [Paper Link](https://huggingface.co/papers/2509.24979)
12
+
13
+ ## Abstract
14
+
15
+ RGBA video generation, which includes an alpha channel to represent transparency, is gaining increasing attention across a wide range of applications. However, existing methods often neglect visual quality, limiting their practical usability. In this paper, we propose Wan-Alpha, a new framework that generates transparent videos by learning both RGB and alpha channels jointly. We design an effective variational autoencoder (VAE) that encodes the alpha channel into the RGB latent space. Then, to support the training of our diffusion transformer, we construct a high-quality and diverse RGBA video dataset. Compared with state-to-art methods, our model demonstrates superior performance in visual quality, motion realism, and transparency rendering. Notably, our model can generate a wide variety of semi-transparent objects, glowing effects, and fine-grained details such as hair strands. The released model is available on our website: this https URL .
16
+
17
  <div align="center">
18
 
19
  <h1>
 
55
 
56
  ## πŸš€ Quick Start
57
 
58
+ ### 1. Environment Setup
59
+ ```bash
60
+ # Clone the project repository
61
+ git clone https://github.com/WeChatCV/Wan-Alpha.git
62
+ cd Wan-Alpha
63
+
64
+ # Create and activate Conda environment
65
+ conda create -n Wan-Alpha python=3.11 -y
66
+ conda activate Wan-Alpha
67
+
68
+ # Install dependencies
69
+ pip install -r requirements.txt
70
+ ```
71
+
72
+ ### 2. Model Download
73
+ Download [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)
74
+
75
+ Download [Lightx2v-T2V-14B](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
76
+
77
+ Download [Wan-Alpha VAE](https://huggingface.co/htdong/Wan-Alpha)
78
+
79
+ ### πŸ§ͺ Usage
80
+ You can test our model through:
81
+ ```bash
82
+ torchrun --nproc_per_node=8 --master_port=29501 generate_dora_lightx2v.py --size 832*480\
83
+ --ckpt_dir "path/to/your/Wan-2.1/Wan2.1-T2V-14B" \
84
+ --dit_fsdp --t5_fsdp --ulysses_size 8 \
85
+ --vae_lora_checkpoint "path/to/your/decoder.bin" \
86
+ --lora_path "path/to/your/epoch-13-1500.safetensors" \
87
+ --lightx2v_path "path/to/your/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors" \
88
+ --sample_guide_scale 1.0 \
89
+ --frame_num 81 \
90
+ --sample_steps 4 \
91
+ --lora_ratio 1.0 \
92
+ --lora_prefix "" \
93
+ --prompt_file ./data/prompt.txt \
94
+ --output_dir ./output
95
+ ```
96
+ You can specify the weights of `Wan2.1-T2V-14B` with `--ckpt_dir`, `LightX2V-T2V-14B with` `--lightx2v_path`, `Wan-Alpha-VAE` with `--vae_lora_checkpoint`, and `Wan-Alpha-T2V` with `--lora_path`. Finally, you can find the rendered RGBA videos with a checkerboard background and PNG frames at `--output_dir`.
97
+
98
+ **Prompt Writing Tip:** You need to specify that the background of the video is transparent, the visual style, the shot type (such as close-up, medium shot, wide shot, or extreme close-up), and a description of the main subject. Prompts support both Chinese and English input.
99
 
100
+ ```bash
101
+ # An example of prompt.
102
+ This video has a transparent background. Close-up shot. A colorful parrot flying. Realistic style.
103
+ ```
104
+
105
+ ## πŸ”¨ Official ComfyUI Version
106
+
107
+ Note: We have reorganized our models to ensure they can be easily loaded into ComfyUI. Please note that these models differ from the ones mentioned above.
108
+
109
+ ### 1. Download models
110
+ - The Wan DiT base model: [wan2.1_t2v_14B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/diffusion_models/wan2.1_t2v_14B_fp16.safetensors)
111
+ - The Wan text encoder: [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)
112
+ - The LightX2V model: [lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
113
+ - Our RGBA Dora: [epoch-13-1500_changed.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/epoch-13-1500_changed.safetensors)
114
+ - Our RGB VAE Decoder: [wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors)
115
+ - Our Alpha VAE Decoder: [wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors)
116
 
117
+ ### 2. Copy the files into the `ComfyUI/models` folder and organize them as follows:
118
+
119
+ ```
120
+ ComfyUI/models
121
+ β”œβ”€β”€ diffusion_models
122
+ β”‚ └── wan2.1_t2v_14B_fp16.safetensors
123
+ ���── loras
124
+ β”‚ β”œβ”€β”€ epoch-13-1500_changed.safetensors
125
+ β”‚ └── lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors
126
+ β”œβ”€β”€ text_encoders
127
+ β”‚ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
128
+ β”œβ”€β”€ vae
129
+ β”‚ β”œβ”€β”€ wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors
130
+ β”‚ └── wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors
131
+ ```
132
+
133
+ ### 3. Install our custom RGBA video previewer and PNG frames zip packer. Copy the file [RGBA_save_tools.py](comfyui/RGBA_save_tools.py) into the `ComfyUI/custom_nodes` folder.
134
+
135
+ - Thanks to @mr-lab for an improved WebP version! You can find it in this [issue](https://github.com/WeChatCV/Wan-Alpha/issues/4).
136
+
137
+ ### 4. Example workflow: [wan_alpha_t2v_14B.json](comfyui/wan_alpha_t2v_14B.json)
138
+
139
+ <img src="comfyui/comfyui.jpg" style="margin:auto;"/>
140
+
141
+ ---
142
 
143
  ## 🀝 Acknowledgements
144
 
 
172
 
173
  ## πŸ“¬ Contact Us
174
 
175
+ If you have any questions or suggestions, feel free to reach out via [GitHub Issues](https://github.com/WeChatCV/Wan-Alpha/issues) . We look forward to your feedback!