Frequency Dynamic Convolution for Dense Image Prediction
Abstract
While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.
Community
TL, DR: With merely 1/20 extra parameters, surpass CondConv/DY - Conv! A team from Beihang University & the University of Tokyo presents FDConv. Through Fourier - domain parameter decoupling and spatial - frequency modulation, it realizes "high - frequency for details, low - frequency for noise reduction" in dynamic convolution weights, refreshing SOTA in detection/segmentation tasks!
一句话亮点: 仅用1/20额外参数量超越CondConv/DY-Conv!北理工&东京大学团队提出频率动态卷积(FDConv),通过傅里叶域参数解耦+空间-频率双维度调制,让动态卷积的权重真正实现「高频抓细节,低频降噪声」,在检测/分割任务中全面刷新SOTA!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FwNet-ECA: A Classification Model Enhancing Window Attention with Global Receptive Fields via Fourier Filtering Operations (2025)
- A super-resolution reconstruction method for lightweight building images based on an expanding feature modulation network (2025)
- FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off (2025)
- Dual-domain Modulation Network for Lightweight Image Super-Resolution (2025)
- ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation (2025)
- OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels (2025)
- iFormer: Integrating ConvNet and Transformer for Mobile Application (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper