arxiv:2503.08096

MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution

Published on Mar 11

Authors:

Abstract

MegaSR enhances text-to-image diffusion models for image super-resolution by incorporating customized block-wise semantics and multi-stage aggregation of expressive guidance like HED edge maps, depth maps, and segmentation maps.

AI-generated summary

Pioneering text-to-image (T2I) diffusion models have ushered in a new era of real-world image super-resolution (Real-ISR), significantly enhancing the visual perception of reconstructed images. However, existing methods typically integrate uniform abstract textual semantics across all blocks, overlooking the distinct semantic requirements at different depths and the fine-grained, concrete semantics inherently present in the images themselves. Moreover, relying solely on a single type of guidance further disrupts the consistency of reconstruction. To address these issues, we propose MegaSR, a novel framework that mines customized block-wise semantics and expressive guidance for diffusion-based ISR. Compared to uniform textual semantics, MegaSR enables flexible adaptation to multi-granularity semantic awareness by dynamically incorporating image attributes at each block. Furthermore, we experimentally identify HED edge maps, depth maps, and segmentation maps as the most expressive guidance, and propose a multi-stage aggregation strategy to modulate them into the T2I models. Extensive experiments demonstrate the superiority of MegaSR in terms of semantic richness and structural consistency.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.08096 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.08096 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.08096 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.