arxiv:2506.20452

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Published on Jun 25

· Submitted by

msadat97 on Jun 26

Upvote

Authors:

Seyedmorteza Sadat ,

Romann M. Weber

Abstract

HiWave enhances ultra-high-resolution image synthesis using pretrained diffusion models through a two-stage pipeline involving DDIM inversion and wavelet-based detail enhancement, improving visual fidelity and reducing artifacts.

AI-generated summary

Diffusion models have emerged as the leading approach for image synthesis, demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts, including object duplication and spatial incoherence. In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline: generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures. Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality. A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.

View arXiv page View PDF Add to collection

Community

msadat97

Paper author Paper submitter about 12 hours ago

TLDR: HiWave is a simple, training-free approach for synthesizing high-resolution images beyond the original training resolution of diffusion models. It adopts patch-wise DDIM inversion with a wavelet-based detail enhancement module to produce coherent, highly detailed 4K images without object duplication.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.20452 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.20452 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.20452 in a Space README.md to link it from this page.