Papers
arxiv:2504.07963

PixelFlow: Pixel-Space Generative Models with Flow

Published on Apr 10
ยท Submitted by ShoufaChen on Apr 14
Authors:
,
,
,
,

Abstract

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256times256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Community

Paper submitter

PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models.

have you taken any inspiration from "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" (https://arxiv.org/abs/2404.02905) ?

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

I do like the approach. Finding paths not only between same resolution distributions but integrating upscaling into this sounds quite elegant. Though, in my eyes the cojmputational overhead from this will make scaling difficult, right?

Sounds like Next-level de-convolutions, congrats! Add ControlNets into the mix - and it will be a killer model for general use, imho

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.07963 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 3