FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing
Abstract
FlashEdit enables real-time, high-fidelity image editing with diffusion models through efficient inversion, background preservation, and localized attention mechanisms.
Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.
Community
Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150times speedup compared to prior multi-step methods.
This paper shares a very similar idea to SwiftEdit: https://arxiv.org/abs/2412.04301 (CVPR 2025). But no citation or discussion is mentioned.
Hi,
I am the first author of SwiftEdit(https://swift-edit.github.io/), which was accepted at CVPR25 prior to this work. I found that this work strongly resembles SwiftEdit in terms of the core idea, architectural design, training strategy, and editing process.
While I fully support open research and welcome contributions that build upon our work, proper citation and acknowledgment are fundamental to research integrity. I am concerned that SwiftEdit is not cited here despite the significant overlap. I kindly request that the authors update this project page and the associated paper to give appropriate credit.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TweezeEdit: Consistent and Efficient Image Editing with Path Regularization (2025)
- ContextFlow: Training-Free Video Object Editing via Adaptive Context Enrichment (2025)
- LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing (2025)
- Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control (2025)
- Visual Autoregressive Modeling for Instruction-Guided Image Editing (2025)
- CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing (2025)
- Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper