|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- THUDM/CogVideoX-5b |
|
language: |
|
- en |
|
tags: |
|
- video-generation |
|
- paddlemix |
|
--- |
|
|
|
简体中文 | [English](README.md) |
|
# VCtrl |
|
<p style="text-align: center;"> |
|
<p align="center"> |
|
<a href="https://huggingface.co/PaddleMIX">🤗 Huggingface Space</a> | |
|
<a href="https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl">🌐 Github </a> | |
|
<a href="">📜 arxiv </a> | |
|
<a href="https://pp-vctrl.github.io/">📷 Project </a> |
|
</p> |
|
|
|
## 模型介绍 |
|
**VCtrl** 是一个通用的视频生成控制模型,通过引入辅助条件编码器,能够灵活对接各类控制模块,并且在不改变原始生成器的前提下避免了大规模重训练。该模型利用稀疏残差连接实现对控制信号的高效传递,同时通过统一的条件编码流程,将多种控制输入转换为标准化表示,再结合任务特定掩码以提升适应性。得益于这种统一而灵活的设计,VCtrl 可广泛应用于**人物动画**、**场景转换**、**视频编辑**等视频生成场景。下表展示我们在本代提供的视频生成模型列表相关信息: |
|
|
|
<table style="border-collapse: collapse; width: 100%;"> |
|
<tr> |
|
<th style="text-align: center;">模型名</th> |
|
<th style="text-align: center;">VCtrl-Canny</th> |
|
<th style="text-align: center;">VCtrl-Mask</th> |
|
<th style="text-align: center;">VCtrl-Pose</th> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">视频分辨率</td> |
|
<td colspan="1" style="text-align: center;">720 * 480</td> |
|
<td colspan="1" style="text-align: center;"> 720*480 </td> |
|
<td colspan="1" style="text-align: center;"> 720*480 & 480*720 </td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">推理精度</td> |
|
<td colspan="3" style="text-align: center;"><b>FP16(推荐)</b></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">单GPU显存消耗</td> |
|
<td colspan="3" style="text-align: center;"><b>V100: 32GB minimum*</b></td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">推理速度<br>(Step = 25, FP16)</td> |
|
<td colspan="3" style="text-align: center;">单卡A100: ~300秒(49帧)<br>单卡V100: ~400秒(49帧)</td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">提示词语言</td> |
|
<td colspan="5" style="text-align: center;">English*</td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">提示词长度上限</td> |
|
<td colspan="3" style="text-align: center;">224 Tokens</td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">视频长度</td> |
|
<td colspan="3" style="text-align: center;">T2V模型只支持49帧,I2V模型可以扩展为任意帧</td> |
|
</tr> |
|
<tr> |
|
<td style="text-align: center;">帧率</td> |
|
<td colspan="3" style="text-align: center;">30 帧 / 秒 </td> |
|
</tr> |
|
</table> |
|
|
|
## 快速开始 🤗 |
|
|
|
本模型已经支持使用 paddlemix 的 ppdiffusers 库进行部署,你可以按照以下步骤进行部署。 |
|
|
|
**我们推荐您进入我们的 [github](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl)以获得更好的体验。** |
|
|
|
1. 安装对应的依赖 |
|
|
|
```shell |
|
# 克隆 PaddleMIX 仓库 |
|
git clone https://github.com/PaddlePaddle/PaddleMIX.git |
|
#安装paddlemix |
|
cd PaddleMIX |
|
pip install -e . |
|
# 安装ppdiffusers |
|
pip install -e ppdiffusers |
|
# 安装paddlenlp |
|
pip install paddlenlp==v3.0.0-beta2 |
|
# 进入 vctrl目录 |
|
cd ppdiffusers/examples/ppvctrl |
|
# 安装其他所需的依赖 |
|
pip install -r requirements.txt |
|
#安装paddlex |
|
pip install paddlex==3.0.0b2 |
|
|
|
``` |
|
|
|
2. 运行代码 |
|
|
|
```python |
|
import os |
|
import paddle |
|
import numpy as np |
|
from decord import VideoReader |
|
from moviepy.editor import ImageSequenceClip |
|
from PIL import Image |
|
from ppdiffusers import ( |
|
CogVideoXDDIMScheduler, |
|
CogVideoXTransformer3DVCtrlModel, |
|
CogVideoXVCtrlPipeline, |
|
VCtrlModel, |
|
) |
|
def write_mp4(video_path, samples, fps=8): |
|
clip = ImageSequenceClip(samples, fps=fps) |
|
clip.write_videofile(video_path, audio_codec="aac") |
|
|
|
|
|
def save_vid_side_by_side(batch_output, validation_control_images, output_folder, fps): |
|
flattened_batch_output = [img for sublist in batch_output for img in sublist] |
|
ori_video_path = output_folder + "/origin_predict.mp4" |
|
video_path = output_folder + "/test_1.mp4" |
|
ori_final_images = [] |
|
final_images = [] |
|
outputs = [] |
|
|
|
def get_concat_h(im1, im2): |
|
dst = Image.new("RGB", (im1.width + im2.width, max(im1.height, im2.height))) |
|
dst.paste(im1, (0, 0)) |
|
dst.paste(im2, (im1.width, 0)) |
|
return dst |
|
|
|
for image_list in zip(validation_control_images, flattened_batch_output): |
|
predict_img = image_list[1].resize(image_list[0].size) |
|
result = get_concat_h(image_list[0], predict_img) |
|
ori_final_images.append(np.array(image_list[1])) |
|
final_images.append(np.array(result)) |
|
outputs.append(np.array(predict_img)) |
|
write_mp4(ori_video_path, ori_final_images, fps=fps) |
|
write_mp4(video_path, final_images, fps=fps) |
|
output_path = output_folder + "/output.mp4" |
|
write_mp4(output_path, outputs, fps=fps) |
|
|
|
|
|
def load_images_from_folder_to_pil(folder): |
|
images = [] |
|
valid_extensions = {".jpg", ".jpeg", ".png", ".bmp", ".gif", ".tiff"} |
|
|
|
def frame_number(filename): |
|
new_pattern_match = re.search("frame_(\\d+)_7fps", filename) |
|
if new_pattern_match: |
|
return int(new_pattern_match.group(1)) |
|
matches = re.findall("\\d+", filename) |
|
if matches: |
|
if matches[-1] == "0000" and len(matches) > 1: |
|
return int(matches[-2]) |
|
return int(matches[-1]) |
|
return float("inf") |
|
|
|
sorted_files = sorted(os.listdir(folder), key=frame_number) |
|
for filename in sorted_files: |
|
ext = os.path.splitext(filename)[1].lower() |
|
if ext in valid_extensions: |
|
img = Image.open(os.path.join(folder, filename)).convert("RGB") |
|
images.append(img) |
|
return images |
|
|
|
|
|
def load_images_from_video_to_pil(video_path): |
|
images = [] |
|
vr = VideoReader(video_path) |
|
length = len(vr) |
|
for idx in range(length): |
|
frame = vr[idx].asnumpy() |
|
images.append(Image.fromarray(frame)) |
|
return images |
|
|
|
|
|
validation_control_images = load_images_from_video_to_pil('your_path') |
|
prompt = 'Group of fishes swimming in aquarium.' |
|
vctrl = VCtrlModel.from_pretrained( |
|
paddlemix/vctrl-5b-t2v-canny, |
|
low_cpu_mem_usage=True, |
|
paddle_dtype=paddle.float16 |
|
) |
|
pipeline = CogVideoXVCtrlPipeline.from_pretrained( |
|
paddlemix/cogvideox-5b-vctrl, |
|
vctrl=vctrl, |
|
paddle_dtype=paddle.float16, |
|
low_cpu_mem_usage=True, |
|
map_location="cpu", |
|
) |
|
pipeline.scheduler = CogVideoXDDIMScheduler.from_config(pipeline.scheduler.config, timestep_spacing="trailing") |
|
pipeline.vae.enable_tiling() |
|
pipeline.vae.enable_slicing() |
|
task='canny' |
|
final_result=[] |
|
video = pipeline( |
|
prompt=prompt, |
|
num_inference_steps=25, |
|
num_frames=49, |
|
guidance_scale=35, |
|
generator=paddle.Generator().manual_seed(42), |
|
conditioning_frames=validation_control_images[:num_frames], |
|
conditioning_frame_indices=list(range(num_frames)), |
|
conditioning_scale=1.0, |
|
width=720, |
|
height=480, |
|
task='canny', |
|
conditioning_masks=validation_mask_images[:num_frames] if task == "mask" else None, |
|
vctrl_layout_type='spacing', |
|
).frames[0] |
|
final_result.append(video) |
|
save_vid_side_by_side(final_result, validation_control_images[:num_frames], 'save.mp4', fps=30) |
|
``` |
|
|
|
## 深入研究 |
|
|
|
欢迎进入我们的 [github]("https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl"),你将获得: |
|
|
|
1. 更加详细的技术细节介绍和代码解释。 |
|
2. 控制条件的提取算法细节。 |
|
3. 模型推理的详细代码。 |
|
4. 项目更新日志动态,更多互动机会。 |
|
5. PaddleMix工具链,帮助您更好的使用模型。 |
|
|
|
<!-- ## 引用 |
|
|
|
``` |
|
@article{yang2024cogvideox, |
|
title={VCtrl: Enabling Versatile Controls for Video Diffusion Models}, |
|
year={2025} |
|
} |
|
``` --> |