zenlm
/

zen-director

+---
+license: apache-2.0
+language:
+- en
+- zh
+pipeline_tag: text-to-video
+tags:
+- zen
+- hanzo-ai
+- video-generation
+- text-to-video
+- image-to-video
+- wan2.2
+- diffusion
+base_model: Wan-AI/Wan2.2-TI2V-5B
+---
+# Zen Director
+Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation.
+## Base Model
+Built on **[Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)** - Text-to-Image-to-Video model with 5B parameters.
+**Note:** This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.
+## Capabilities
+- **Text-to-Video**: Generate videos from text descriptions
+- **Image-to-Video**: Animate static images into videos
+- **High Resolution**: Supports high-quality video generation
+- **Efficient**: Optimized MoE architecture for fast inference
+## Model Details
+- **Architecture**: Mixture-of-Experts (MoE) Transformer
+- **Parameters**: 5B total
+- **Base**: Wan 2.2 TI2V
+- **Resolution**: Up to 1280x720
+- **Frame Rate**: 24 FPS
+- **Duration**: Up to 5 seconds
+## Installation
+```bash
+pip install diffusers transformers accelerate torch
+pip install av opencv-python pillow
+```
+## Usage
+### Text-to-Video
+```python
+from diffusers import DiffusionPipeline
+import torch
+# Load the model
+pipe = DiffusionPipeline.from_pretrained(
+    "zenlm/zen-director",
+    torch_dtype=torch.float16
+)
+pipe = pipe.to("cuda")
+# Generate video from text
+prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
+video = pipe(prompt, num_frames=120, height=720, width=1280).frames
+# Save video
+from diffusers.utils import export_to_video
+export_to_video(video, "output.mp4", fps=24)
+```
+### Image-to-Video
+```python
+from PIL import Image
+# Load starting image
+image = Image.open("input.jpg")
+# Generate video from image
+video = pipe(
+    prompt="Animate this image with gentle camera movement",
+    image=image,
+    num_frames=120
+).frames
+export_to_video(video, "animated.mp4", fps=24)
+```
+## Performance
+- **Inference Speed**: ~2-3 seconds/frame on A100
+- **Memory**: Requires 24GB+ VRAM for full resolution
+- **Quantization**: FP16 recommended for consumer GPUs
+## Roadmap
+- ✅ **v1.0** - Wan 2.2 TI2V-5B base (current)
+- 🔄 **v2.0** - Upgrade to Wan 2.5 when open-source
+- 📋 **Future** - Fine-tuning for specific styles and domains
+## Limitations
+- Requires high-end GPU (24GB+ VRAM recommended)
+- Video duration limited to 5 seconds
+- Best results with detailed, specific prompts
+- Some motion artifacts in complex scenes
+## Citation
+```bibtex
+@misc{zen-director-2025,
+  title={Zen Director: Video Generation with Wan 2.2},
+  author={Hanzo AI},
+  year={2025},
+  publisher={HuggingFace},
+  howpublished={\url{https://huggingface.co/zenlm/zen-director}}
+}
+@article{wan2024,
+  title={Wan 2.2: High-Quality Video Generation},
+  author={Wan-AI Team},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## License
+Apache 2.0
+---
+**Note**: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.