zeekay commited on
Commit
af14b0e
·
verified ·
1 Parent(s): 79e8305

Initial commit: Zen Director based on Wan 2.2 TI2V-5B

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: text-to-video
7
+ tags:
8
+ - zen
9
+ - hanzo-ai
10
+ - video-generation
11
+ - text-to-video
12
+ - image-to-video
13
+ - wan2.2
14
+ - diffusion
15
+ base_model: Wan-AI/Wan2.2-TI2V-5B
16
+ ---
17
+
18
+ # Zen Director
19
+
20
+ Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation.
21
+
22
+ ## Base Model
23
+
24
+ Built on **[Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)** - Text-to-Image-to-Video model with 5B parameters.
25
+
26
+ **Note:** This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.
27
+
28
+ ## Capabilities
29
+
30
+ - **Text-to-Video**: Generate videos from text descriptions
31
+ - **Image-to-Video**: Animate static images into videos
32
+ - **High Resolution**: Supports high-quality video generation
33
+ - **Efficient**: Optimized MoE architecture for fast inference
34
+
35
+ ## Model Details
36
+
37
+ - **Architecture**: Mixture-of-Experts (MoE) Transformer
38
+ - **Parameters**: 5B total
39
+ - **Base**: Wan 2.2 TI2V
40
+ - **Resolution**: Up to 1280x720
41
+ - **Frame Rate**: 24 FPS
42
+ - **Duration**: Up to 5 seconds
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ pip install diffusers transformers accelerate torch
48
+ pip install av opencv-python pillow
49
+ ```
50
+
51
+ ## Usage
52
+
53
+ ### Text-to-Video
54
+
55
+ ```python
56
+ from diffusers import DiffusionPipeline
57
+ import torch
58
+
59
+ # Load the model
60
+ pipe = DiffusionPipeline.from_pretrained(
61
+ "zenlm/zen-director",
62
+ torch_dtype=torch.float16
63
+ )
64
+ pipe = pipe.to("cuda")
65
+
66
+ # Generate video from text
67
+ prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
68
+ video = pipe(prompt, num_frames=120, height=720, width=1280).frames
69
+
70
+ # Save video
71
+ from diffusers.utils import export_to_video
72
+ export_to_video(video, "output.mp4", fps=24)
73
+ ```
74
+
75
+ ### Image-to-Video
76
+
77
+ ```python
78
+ from PIL import Image
79
+
80
+ # Load starting image
81
+ image = Image.open("input.jpg")
82
+
83
+ # Generate video from image
84
+ video = pipe(
85
+ prompt="Animate this image with gentle camera movement",
86
+ image=image,
87
+ num_frames=120
88
+ ).frames
89
+
90
+ export_to_video(video, "animated.mp4", fps=24)
91
+ ```
92
+
93
+ ## Performance
94
+
95
+ - **Inference Speed**: ~2-3 seconds/frame on A100
96
+ - **Memory**: Requires 24GB+ VRAM for full resolution
97
+ - **Quantization**: FP16 recommended for consumer GPUs
98
+
99
+ ## Roadmap
100
+
101
+ - ✅ **v1.0** - Wan 2.2 TI2V-5B base (current)
102
+ - 🔄 **v2.0** - Upgrade to Wan 2.5 when open-source
103
+ - 📋 **Future** - Fine-tuning for specific styles and domains
104
+
105
+ ## Limitations
106
+
107
+ - Requires high-end GPU (24GB+ VRAM recommended)
108
+ - Video duration limited to 5 seconds
109
+ - Best results with detailed, specific prompts
110
+ - Some motion artifacts in complex scenes
111
+
112
+ ## Citation
113
+
114
+ ```bibtex
115
+ @misc{zen-director-2025,
116
+ title={Zen Director: Video Generation with Wan 2.2},
117
+ author={Hanzo AI},
118
+ year={2025},
119
+ publisher={HuggingFace},
120
+ howpublished={\url{https://huggingface.co/zenlm/zen-director}}
121
+ }
122
+
123
+ @article{wan2024,
124
+ title={Wan 2.2: High-Quality Video Generation},
125
+ author={Wan-AI Team},
126
+ journal={arXiv preprint},
127
+ year={2024}
128
+ }
129
+ ```
130
+
131
+ ## License
132
+
133
+ Apache 2.0
134
+
135
+ ---
136
+
137
+ **Note**: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.