arxiv:2509.22414

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Published on Sep 26

· Submitted by

Owen on Sep 29

· W2GenAI Lab

Upvote

Authors:

Abstract

LucidFlux, a caption-free UIR framework using a diffusion transformer, achieves robust image restoration through adaptive conditioning and SigLIP features without text prompts.

AI-generated summary

Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.

View arXiv page View PDF Project page GitHub 241 Add to collection

Community

Owen777

Paper submitter 12 days ago

LucidFlux: Caption-Free Universal Image Restoration with Large DiTs
TL;DR: We adapt a large diffusion transformer (Flux.1) for UIR without captions. A dual-branch conditioner (degraded input + lightly restored proxy) anchors geometry and suppresses artifacts, while a timestep & layer-adaptive modulation routes guidance across the DiT for coarse-to-fine updates. SigLIP-based alignment preserves semantics without prompts/VLM latency. A 3-stage data curation pipeline yields structure-rich training sets. Across synthetic & in-the-wild benchmarks, LucidFlux outperforms strong open-/closed-source baselines; ablations confirm each component.
Why it matters: Moves UIR beyond UNet/ControlNet sprawl—when, where, and what to condition becomes the lever for robust, caption-free restoration.

Project page: https://w2genai-lab.github.io/LucidFlux
Code: https://github.com/W2GenAI-Lab/LucidFlux