Papers
arxiv:2509.24979

Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel

Published on Sep 29
Authors:
,
,
,

Abstract

Wan-Alpha, a new framework, generates high-quality RGBA videos by jointly learning RGB and alpha channels using a variational autoencoder and diffusion transformer.

AI-generated summary

RGBA video generation, which includes an alpha channel to represent transparency, is gaining increasing attention across a wide range of applications. However, existing methods often neglect visual quality, limiting their practical usability. In this paper, we propose Wan-Alpha, a new framework that generates transparent videos by learning both RGB and alpha channels jointly. We design an effective variational autoencoder (VAE) that encodes the alpha channel into the RGB latent space. Then, to support the training of our diffusion transformer, we construct a high-quality and diverse RGBA video dataset. Compared with state-of-the-art methods, our model demonstrates superior performance in visual quality, motion realism, and transparency rendering. Notably, our model can generate a wide variety of semi-transparent objects, glowing effects, and fine-grained details such as hair strands. The released model is available on our website: https://donghaotian123.github.io/Wan-Alpha/{https://donghaotian123.github.io/Wan-Alpha/}.

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.24979 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.24979 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.