---
license: mit
library_name: diffusers
pipeline_tag: image-to-video
---
## ðĨðĨðĨ News!!
* Mar 17, 2025: ð We release the inference code and model weights of Step-Video-Ti2V. [Download](https://huggingface.co/stepfun-ai/stepvideo-ti2v)
* Mar 17, 2025: ð We have made our technical report available as open source. [Read](https://arxiv.org/abs/2502.10248)
## ð§ 4.2 Dependencies and Installation
```bash
git clone https://github.com/stepfun-ai/Step-Video-TI2V.git
conda create -n stepvideo python=3.10
conda activate stepvideo
cd StepFun-StepVideo
pip install -e .
```
## ð Inference Scripts
- We employed a decoupling strategy for the text encoder, VAE decoding, and DiT to optimize GPU resource utilization by DiT. As a result, a dedicated GPU is needed to handle the API services for the text encoder's embeddings and VAE decoding.
```bash
python api/call_remote_server.py --model_dir where_you_download_dir & ## We assume you have more than 4 GPUs available. This command will return the URL for both the caption API and the VAE API. Please use the returned URL in the following command.
parallel=4 # or parallel=8
url='127.0.0.1'
model_dir=where_you_download_dir
torchrun --nproc_per_node $parallel run_parallel.py \
--model_dir $model_dir \
--vae_url $url \
--caption_url $url \
--ulysses_degree $parallel \
--prompt "į·åĐįŽčĩ·æĨ" \
--first_image_path ./assets/demo.png \
--infer_steps 50 \
--save_path ./results \
--cfg_scale 9.0 \
--motion_score 5.0 \
--time_shift 12.573
```
The following table shows the requirements for running Step-Video-T2V model (batch size = 1, w/o cfg distillation) to generate videos:
| GPU | height/width/frame | Peak GPU Memory | 50 steps |
|------|--------------------|-----------------|----------|
| 1 | 768px à 768px à 102f | 76.42 GB | 1061s |
| 1 | 544px à 992px à 102f | 75.49 GB | 929s |
| 4 | 768px à 768px à 102f | 64.63 GB | 288s |
| 4 | 544px à 992px à 102f | 64.34 GB | 251s |