arxiv:2510.00996

SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model

Published on Oct 1

Authors:

Abstract

SoftCFG, an uncertainty-guided inference method, enhances autoregressive image generation by distributing adaptive perturbations and stabilizing long-sequence generation, improving image quality and achieving state-of-the-art FID.

AI-generated summary

Autoregressive (AR) models have emerged as powerful tools for image generation by modeling images as sequences of discrete tokens. While Classifier-Free Guidance (CFG) has been adopted to improve conditional generation, its application in AR models faces two key issues: guidance diminishing, where the conditional-unconditional gap quickly vanishes as decoding progresses, and over-guidance, where strong conditions distort visual coherence. To address these challenges, we propose SoftCFG, an uncertainty-guided inference method that distributes adaptive perturbations across all tokens in the sequence. The key idea behind SoftCFG is to let each generated token contribute certainty-weighted guidance, ensuring that the signal persists across steps while resolving conflicts between text guidance and visual context. To further stabilize long-sequence generation, we introduce Step Normalization, which bounds cumulative perturbations of SoftCFG. Our method is training-free, model-agnostic, and seamlessly integrates with existing AR pipelines. Experiments show that SoftCFG significantly improves image quality over standard CFG and achieves state-of-the-art FID on ImageNet 256*256 among autoregressive models.

View arXiv page View PDF Add to collection

Community

donglixu

2 days ago

Official Code for SoftCFG

We are excited to release the official implementation of SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Models.

GitHub Repository: https://github.com/Xudangliatiger/SoftCFG
Key Features: Uncertainty-guided perturbation, Step Normalization, compatible with AliTok and RAR.
Installation: Follow the README for setup.

For questions, open an issue in the repo.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.00996 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.00996 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.00996 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.