zen-director
5B parameter text/image-to-video generation model for professional video synthesis
Model Details
- Developed by: Zen Research Authors
- Organization: Zen Research DAO under Zoo Labs Inc (501(c)(3) Non-Profit)
- Location: San Francisco, California, USA
- Model type: text-to-video
- Architecture: Diffusion Transformer (5B)
- Parameters: 5B
- License: Apache 2.0
- Training: Trained with Zen Gym
- Inference: Optimized for Zen Engine
π Zen AI Ecosystem
This model is part of the Zen Research hypermodal AI family - the world's most comprehensive open-source AI ecosystem.
Complete Model Family
Language Models:
- zen-nano-0.6b - 0.6B edge model (44K tokens/sec)
- zen-eco-4b-instruct - 4B instruction model
- zen-eco-4b-thinking - 4B reasoning model
- zen-agent-4b - 4B tool-calling agent
3D & World Generation:
- zen-3d - Controllable 3D asset generation
- zen-voyager - Camera-controlled world exploration
- zen-world - Large-scale world simulation
Video Generation:
- zen-director - Text/image-to-video (5B)
- zen-video - Professional video synthesis
- zen-video-i2v - Image-to-video animation
Audio Generation:
- zen-musician - Music generation (7B)
- zen-foley - Video-to-audio Foley effects
Infrastructure:
- Zen Gym - Unified training platform
- Zen Engine - High-performance inference
Usage
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-director")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-director")
from zen_director import ZenDirectorPipeline
pipeline = ZenDirectorPipeline.from_pretrained("zenlm/zen-director")
video = pipeline(
prompt="A cinematic shot of a sunset over mountains",
num_frames=120,
fps=24,
resolution=(1280, 720)
)
video.save("output.mp4")
With Zen Engine
# High-performance inference (44K tokens/sec on M3 Max)
zen-engine serve --model zenlm/zen-director --port 3690
# OpenAI-compatible API
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3690/v1")
response = client.chat.completions.create(
model="zenlm/zen-director",
messages=[{"role": "user", "content": "Hello!"}]
)
Training
Fine-tune with Zen Gym:
git clone https://github.com/zenlm/zen-gym
cd zen-gym
# LoRA fine-tuning
llamafactory-cli train --config configs/zen_lora.yaml \
--model_name_or_path zenlm/zen-director
# GRPO reinforcement learning (40-60% memory reduction)
llamafactory-cli train --config configs/zen_grpo.yaml \
--model_name_or_path zenlm/zen-director
Supported methods: LoRA, QLoRA, DoRA, GRPO, GSPO, DPO, PPO, KTO, ORPO, SimPO, Unsloth
Performance
- Speed: ~60s for 5-second video (RTX 4090)
- Resolution: Up to 1280x720, 24 FPS
- Duration: Up to 10 seconds
- Quality: Professional-grade video synthesis
Ethical Considerations
- Open Research: Released under Apache 2.0 for maximum accessibility
- Environmental Impact: Optimized for eco-friendly deployment
- Transparency: Full training details and model architecture disclosed
- Safety: Comprehensive testing and evaluation
- Non-Profit: Developed by Zoo Labs Inc (501(c)(3)) for public benefit
Citation
@misc{zenzendirector2025,
title={zen-director: 5B parameter text/image-to-video generation model for professional video synthes},
author={Zen Research Authors},
year={2025},
publisher={Zoo Labs Inc},
organization={Zen Research DAO},
url={https://huggingface.co/zenlm/zen-director}
}
Links
- Organization: github.com/zenlm β’ huggingface.co/zenlm
- Training Platform: Zen Gym
- Inference Engine: Zen Engine
- Parent Org: Zoo Labs Inc (501(c)(3) Non-Profit, San Francisco)
- Contact: [email protected] β’ +1 (913) 777-4443
License
Apache License 2.0
Copyright 2025 Zen Research Authors
Zen Research - Building open, eco-friendly AI for everyone π±
- Downloads last month
- 23