Abstract
The proposed Text-Aware Image Restoration (TAIR) system integrates a multi-task diffusion framework with a text-spotting module to enhance both image recovery and textual fidelity, outperforming existing diffusion-based methods.
Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Dual Prompting Image Restoration with Diffusion Transformers (2025)
- TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance (2025)
- LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter (2025)
- Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration (2025)
- UniRes: Universal Image Restoration for Complex Degradations (2025)
- DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution (2025)
- GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper