SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

Skyreels Logo

๐ŸŒ Github ยท ๐Ÿ‘‹ Playground

This repo contains Diffusers style model weights for Skyreels A1 models. You can find the inference code on SkyReels-A1 repository.


image/png Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.


Some generated results:

Citation

If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:

@misc{qiu2025skyreelsa1expressiveportraitanimation,
      title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers}, 
      author={Di Qiu and Zhengcong Fei and Rui Wang and Jialin Bai and Changqian Yu and Mingyuan Fan and Guibin Chen and Xiang Wen},
      year={2025},
      eprint={2502.10841},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.10841}, 
}
Downloads last month
307
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-to-video models for diffusers library.

Model tree for Skywork/SkyReels-A1

Quantized
(1)
this model