Wan2.2-Fun-A14B-InP / README_en.md

bubbliiiing

Upload Readme

b2baf1d 5 days ago

15 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: text-to-video
	library_name: diffusers
	tags:
	- video
	- video-generation
	---


	# Wan-Fun

	😊 Welcome!

	[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/Wan2.1-Fun-1.3B-InP)

	[![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)

	[English](./README_en.md) \| [简体中文](./README.md)

	# Table of Contents
	- [Table of Contents](#table-of-contents)
	- [Model zoo](#model-zoo)
	- [Video Result](#video-result)
	- [Quick Start](#quick-start)
	- [How to use](#how-to-use)
	- [Reference](#reference)
	- [License](#license)

	# 模型地址

	\| 名称 \| 存储空间 \| Hugging Face \| Model Scope \| 描述 \|
	\|--\|--\|--\|--\|--\|
	\| Wan2.2-Fun-A14B-InP \| 47.0 GB \| [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP) \| [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-InP) \| Wan2.2-Fun-14B text-to-video generation weights, trained at multiple resolutions, supports start-end image prediction. \|
	\| Wan2.2-Fun-A14B-Control \| 47.0 GB \| [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-Control) \| [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control)\| Wan2.2-Fun-14B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support. \|

	# Video Result

	### Wan2.2-Fun-A14B-InP

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_1.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_2.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_3.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_4.mp4" width="100%" controls autoplay loop></video>
	</td>
	</tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_5.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_6.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_7.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_8.mp4" width="100%" controls autoplay loop></video>
	</td>
	</tr>
	</table>

	### Wan2.2-Fun-A14B-Control

	Generic Control Video + Reference Image:
	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>
	Reference Image
	</td>
	<td>
	Control Video
	</td>
	<td>
	Wan2.2-Fun-14B-Control
	</td>
	<tr>
	<td>
	<image src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/8.png" width="100%" controls autoplay loop></image>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/14b_ref.mp4" width="100%" controls autoplay loop></video>
	</td>
	<tr>
	</table>

	Generic Control Video (Canny, Pose, Depth, etc.) and Trajectory Control:
	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/guiji.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/guiji_out.mp4" width="100%" controls autoplay loop></video>
	</td>
	<tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/canny.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/depth.mp4" width="100%" controls autoplay loop></video>
	</td>
	<tr>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose_out.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/canny_out.mp4" width="100%" controls autoplay loop></video>
	</td>
	<td>
	<video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/depth_out.mp4" width="100%" controls autoplay loop></video>
	</td>
	</tr>
	</table>

	# Quick Start
	### 1. Cloud usage: AliyunDSW/Docker
	#### a. From AliyunDSW
	DSW has free GPU time, which can be applied once by a user and is valid for 3 months after applying.

	Aliyun provide free GPU time in [Freetier](https://free.aliyun.com/?product=9602825&crowd=enterprise&spm=5176.28055625.J_5831864660.1.e939154aRgha4e&scm=20140722.M_9974135.P_110.MO_1806-ID_9974135-MID_9974135-CID_30683-ST_8512-V_1), get it and use in Aliyun PAI-DSW to start CogVideoX-Fun within 5min!

	[![DSW Notebook](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/dsw.png)](https://gallery.pai-ml.com/#/preview/deepLearning/cv/cogvideox_fun)

	#### b. From ComfyUI
	Our ComfyUI is as follows, please refer to [ComfyUI README](comfyui/README.md) for details.
	![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/cogvideoxfunv1_workflow_i2v.jpg)

	#### c. From docker
	If you are using docker, please make sure that the graphics card driver and CUDA environment have been installed correctly in your machine.

	Then execute the following commands in this way:
	```
	# pull image
	docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun

	# enter image
	docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun

	# clone code
	git clone https://github.com/aigc-apps/VideoX-Fun.git

	# enter VideoX-Fun's dir
	cd VideoX-Fun

	# download weights
	mkdir models/Diffusion_Transformer
	mkdir models/Personalized_Model

	# Please use the hugginface link or modelscope link to download the model.
	# CogVideoX-Fun
	# https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP
	# https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP

	# Wan
	# https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-14B-InP
	# https://modelscope.cn/models/PAI/Wan2.1-Fun-V1.1-14B-InP
	# https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP
	# https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-InP
	```

	### 2. Local install: Environment Check/Downloading/Installation
	#### a. Environment Check
	We have verified this repo execution on the following environment:

	The detailed of Windows:
	- OS: Windows 10
	- python: python3.10 & python3.11
	- pytorch: torch2.2.0
	- CUDA: 11.8 & 12.1
	- CUDNN: 8+
	- GPU： Nvidia-3060 12G & Nvidia-3090 24G

	The detailed of Linux:
	- OS: Ubuntu 20.04, CentOS
	- python: python3.10 & python3.11
	- pytorch: torch2.2.0
	- CUDA: 11.8 & 12.1
	- CUDNN: 8+
	- GPU：Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G

	We need about 60GB available on disk (for saving weights), please check!

	#### b. Weights
	We'd better place the [weights](#model-zoo) along the specified path:

	Via ComfyUI:
	Put the models into the ComfyUI weights folder `ComfyUI/models/Fun_Models/`:
	```
	📦 ComfyUI/
	├── 📂 models/
	│ └── 📂 Fun_Models/
	│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
	│ ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
	│ ├── 📂 Wan2.1-Fun-14B-InP
	│ └── 📂 Wan2.1-Fun-1.3B-InP/
	```

	Run its own python file or UI interface:
	```
	📦 models/
	├── 📂 Diffusion_Transformer/
	│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
	│ ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
	│ ├── 📂 Wan2.1-Fun-14B-InP
	│ └── 📂 Wan2.1-Fun-1.3B-InP/
	├── 📂 Personalized_Model/
	│ └── your trained trainformer model / your trained lora model (for UI load)
	```

	# How to Use

	<h3 id="video-gen">1. Generation</h3>

	#### a. GPU Memory Optimization
	Since Wan2.1 has a very large number of parameters, we need to consider memory optimization strategies to adapt to consumer-grade GPUs. We provide `GPU_memory_mode` for each prediction file, allowing you to choose between `model_cpu_offload`, `model_cpu_offload_and_qfloat8`, and `sequential_cpu_offload`. This solution is also applicable to CogVideoX-Fun generation.

	- `model_cpu_offload`: The entire model is moved to the CPU after use, saving some GPU memory.
	- `model_cpu_offload_and_qfloat8`: The entire model is moved to the CPU after use, and the transformer model is quantized to float8, saving more GPU memory.
	- `sequential_cpu_offload`: Each layer of the model is moved to the CPU after use. It is slower but saves a significant amount of GPU memory.

	`qfloat8` may slightly reduce model performance but saves more GPU memory. If you have sufficient GPU memory, it is recommended to use `model_cpu_offload`.

	#### b. Using ComfyUI
	For details, refer to [ComfyUI README](https://github.com/aigc-apps/VideoX-Fun/tree/main/comfyui)。

	#### c. Running Python Files
	- Step 1: Download the corresponding [weights](#model-zoo) and place them in the `models` folder.
	- Step 2: Use different files for prediction based on the weights and prediction goals. This library currently supports CogVideoX-Fun, Wan2.1, Wan2.1-Fun and Wan2.2. Different models are distinguished by folder names under the `examples` folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:
	- Text-to-Video:
	- Modify `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_t2v.py`.
	- Run the file `examples/cogvideox_fun/predict_t2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos`.
	- Image-to-Video:
	- Modify `validation_image_start`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_i2v.py`.
	- `validation_image_start` is the starting image of the video, and `validation_image_end` is the ending image of the video.
	- Run the file `examples/cogvideox_fun/predict_i2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_i2v`.
	- Video-to-Video:
	- Modify `validation_video`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_v2v.py`.
	- `validation_video` is the reference video for video-to-video generation. You can use the following demo video: [Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/play_guitar.mp4).
	- Run the file `examples/cogvideox_fun/predict_v2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_v2v`.
	- Controlled Video Generation (Canny, Pose, Depth, etc.):
	- Modify `control_video`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_v2v_control.py`.
	- `control_video` is the control video extracted using operators such as Canny, Pose, or Depth. You can use the following demo video: [Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1.1/pose.mp4).
	- Run the file `examples/cogvideox_fun/predict_v2v_control.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_v2v_control`.
	- Step 3: If you want to integrate other backbones or Loras trained by yourself, modify `lora_path` and relevant paths in `examples/{model_name}/predict_t2v.py` or `examples/{model_name}/predict_i2v.py` as needed.

	#### d. Using the Web UI
	The web UI supports text-to-video, image-to-video, video-to-video, and controlled video generation (Canny, Pose, Depth, etc.). This library currently supports CogVideoX-Fun, Wan2.1, and Wan2.1-Fun. Different models are distinguished by folder names under the `examples` folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:

	- Step 1: Download the corresponding [weights](#model-zoo) and place them in the `models` folder.
	- Step 2: Run the file `examples/cogvideox_fun/app.py` to access the Gradio interface.
	- Step 3: Select the generation model on the page, fill in `prompt`, `neg_prompt`, `guidance_scale`, and `seed`, click "Generate," and wait for the results. The generated videos will be saved in the `sample` folder.

	# Reference
	- CogVideo: https://github.com/THUDM/CogVideo/
	- EasyAnimate: https://github.com/aigc-apps/EasyAnimate
	- Wan2.1: https://github.com/Wan-Video/Wan2.1/
	- Wan2.1: https://github.com/Wan-Video/Wan2.2/
	- ComfyUI-KJNodes: https://github.com/kijai/ComfyUI-KJNodes
	- ComfyUI-EasyAnimateWrapper: https://github.com/kijai/ComfyUI-EasyAnimateWrapper
	- ComfyUI-CameraCtrl-Wrapper: https://github.com/chaojie/ComfyUI-CameraCtrl-Wrapper
	- CameraCtrl: https://github.com/hehao13/CameraCtrl

	# License
	This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).