Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel
Abstract
Wan-Alpha, a new framework, generates high-quality RGBA videos by jointly learning RGB and alpha channels using a variational autoencoder and diffusion transformer.
RGBA video generation, which includes an alpha channel to represent transparency, is gaining increasing attention across a wide range of applications. However, existing methods often neglect visual quality, limiting their practical usability. In this paper, we propose Wan-Alpha, a new framework that generates transparent videos by learning both RGB and alpha channels jointly. We design an effective variational autoencoder (VAE) that encodes the alpha channel into the RGB latent space. Then, to support the training of our diffusion transformer, we construct a high-quality and diverse RGBA video dataset. Compared with state-of-the-art methods, our model demonstrates superior performance in visual quality, motion realism, and transparency rendering. Notably, our model can generate a wide variety of semi-transparent objects, glowing effects, and fine-grained details such as hair strands. The released model is available on our website: https://donghaotian123.github.io/Wan-Alpha/{https://donghaotian123.github.io/Wan-Alpha/}.
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper