update files

Browse files

Files changed (6) hide show

README.md +53 -17
freqfusion.py → densefusion.py +68 -75
get_depth_normap.py +8 -8
model.py +8 -8
test_shadow.py +8 -45
utils/model_utils.py +3 -3

README.md CHANGED Viewed

@@ -1,10 +1,33 @@
-# [TEAM ACVLAB][NTIRE25-Image Shadow Removal Challenge](https://cvlai.net/ntire/2025/) @ [CVPR 2025](https://cvpr.thecvf.com/)
-## Link to the codes/executables of the solution(s):
-* [Checkpoints](https://drive.google.com/file/d/1USD5sLvEcgFqIg7BDzc1OuInzSx3GnUN/view?usp=drive_link)
-* Input / Output file
-## Environments
 ```bash
 conda create -n ntire_shadow python=3.9 -y
@@ -16,15 +39,16 @@ pip install -r requirements.txt
 ```
-## Folder Structure
 ```bash
 test_dir
-├── Origin          <- Put the shadow affected images in this folder
 │   ├── 0000.png
 │   ├── 0001.png
 │   ├── ...
-├── Depth
-├── Normal
 output_dir
@@ -33,7 +57,7 @@ output_dir
 ├──...
 ```
-## How to test?
 1. Clone [Depth anything v2](https://github.com/DepthAnything/Depth-Anything-V2.git)
 ```bash
@@ -49,15 +73,15 @@ python get_depth_normap.py
 Now folder structure will be
 ```bash
 test_dir
-├── Origin
 │   ├── 0000.png
 │   ├── 0001.png
 │   ├── ...
-├── Depth
 │   ├── 0000.npy
 │   ├── 0001.npy
 │   ├── ...
-├── Normal
 │   ├── 0000.npy
 │   ├── 0001.npy
 │   ├── ...
@@ -68,15 +92,15 @@ output_dir
 ├──...
 ```
-1. Clone [DINOv2](https://github.com/facebookresearch/dinov2.git)
 ```bash
 git clone https://github.com/facebookresearch/dinov2.git
 ```
-1. Download [shadow removal weight](https://drive.google.com/file/d/1USD5sLvEcgFqIg7BDzc1OuInzSx3GnUN/view?usp=drive_link)
 ```bash
-gdown 1USD5sLvEcgFqIg7BDzc1OuInzSx3GnUN
 ```
 6. Run ```run_test.sh``` to get inference results.
@@ -84,5 +108,17 @@ gdown 1USD5sLvEcgFqIg7BDzc1OuInzSx3GnUN
 ```bash
 bash run_test.sh
 ```
-## License and Acknowledgement
 This code repository is release under [MIT License](https://github.com/VanLinLin/NTIRE25_Shadow_Removal?tab=MIT-1-ov-file#readme).

+<h1 align="center">[ACMMM 2025] DenseSR: Image Shadow Removal as Dense Prediction</h1>
+<p align="center">Yu-Fan Lin<sup>1</sup>, Chia-ming Lee<sup>1</sup>, Chih-Chung Hsu<sup>2</sup></p>
+<p align="center"><sup>1</sup>National Cheng Kung University&nbsp;&nbsp;<sup>2</sup>National Yang Ming Chiao Tung University</p>
+<div align="center">
+[![arXiv](https://img.shields.io/badge/DenseSR-arXiv-red.svg)](https://www.arxiv.org/abs/2507.16472)
+</div>
+<details>
+<summary>Abstract</summary>
+Shadows are a common factor degrading image quality. Single-image shadow removal (SR), particularly under challenging indirect illumination, is hampered by non-uniform content degradation and inherent ambiguity. Consequently, traditional methods often fail to simultaneously recover intra-shadow details and maintain sharp boundaries, resulting in inconsistent restoration and blurring that negatively affect both downstream applications and the overall viewing experience. To overcome these limitations, we propose the DenseSR, approaching the problem from a dense prediction perspective to emphasize restoration quality. This framework uniquely synergizes two key strategies: (1) deep scene understanding guided by geometric-semantic priors to resolve ambiguity and implicitly localize shadows, and (2) high-fidelity restoration via a novel Dense Fusion Block (DFB) in the decoder. The DFB employs adaptive component processing-using an Adaptive Content Smoothing Module (ACSM) for consistent appearance and a Texture-Boundary Recuperation Module (TBRM) for fine textures and sharp boundaries-thereby directly tackling the inconsistent restoration and blurring issues. These purposefully processed components are effectively fused, yielding an optimized feature representation preserving both consistency and fidelity. Extensive experimental results demonstrate the merits of our approach over existing methods.
+</details>
+## ⭐ Citation
+If you find this project useful, please consider citing us and giving us a star.
+```bash
+@misc{lin2025densesrimageshadowremoval,
+      title={DenseSR: Image Shadow Removal as Dense Prediction},
+      author={Yu-Fan Lin and Chia-Ming Lee and Chih-Chung Hsu},
+      year={2025},
+      eprint={2507.16472},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2507.16472},
+}
+```
+## 🌱 Environments
 ```bash
 conda create -n ntire_shadow python=3.9 -y
 ```
+## 📂 Folder Structure
+You can download WSRD dataset from [here](https://github.com/fvasluianu97/WSRD-DNSR).
 ```bash
 test_dir
+├── origin          <- Put the shadow affected images in this folder
 │   ├── 0000.png
 │   ├── 0001.png
 │   ├── ...
+├── depth
+├── normal
 output_dir
 ├──...
 ```
+## ✨ How to test?
 1. Clone [Depth anything v2](https://github.com/DepthAnything/Depth-Anything-V2.git)
 ```bash
 Now folder structure will be
 ```bash
 test_dir
+├── origin
 │   ├── 0000.png
 │   ├── 0001.png
 │   ├── ...
+├── depth
 │   ├── 0000.npy
 │   ├── 0001.npy
 │   ├── ...
+├── ormal
 │   ├── 0000.npy
 │   ├── 0001.npy
 │   ├── ...
 ├──...
 ```
+4. Clone [DINOv2](https://github.com/facebookresearch/dinov2.git)
 ```bash
 git clone https://github.com/facebookresearch/dinov2.git
 ```
+5. Download [shadow removal weight](https://drive.google.com/file/d/1of3KLSVhaXlsX3jasuwdPKBwb4O4hGZD/view?usp=drive_link)
 ```bash
+gdown 1of3KLSVhaXlsX3jasuwdPKBwb4O4hGZD
 ```
 6. Run ```run_test.sh``` to get inference results.
 ```bash
 bash run_test.sh
 ```
+## 📰 News
+&#10004; 2025/08/11 Release WSRD pretrained model
+&#10004; 2025/08/11 Release inference code
+&#10004; 2025/07/05 Paper Accepted by ACMMM'25
+## 🛠️ TODO
+&#x25FB; Release training code
+&#x25FB; Release other pretrained model
+## 📜 License and
 This code repository is release under [MIT License](https://github.com/VanLinLin/NTIRE25_Shadow_Removal?tab=MIT-1-ov-file#readme).

freqfusion.py → densefusion.py RENAMED Viewed

@@ -7,59 +7,41 @@ from torch.utils.checkpoint import checkpoint
 import warnings
 import numpy as np
-try:
-    from mmcv.ops.carafe import normal_init, xavier_init, carafe
-except ImportError:
-    def xavier_init(module: nn.Module,
-                    gain: float = 1,
-                    bias: float = 0,
-                    distribution: str = 'normal') -> None:
-        assert distribution in ['uniform', 'normal']
-        if hasattr(module, 'weight') and module.weight is not None:
-            if distribution == 'uniform':
-                nn.init.xavier_uniform_(module.weight, gain=gain)
-            else:
-                nn.init.xavier_normal_(module.weight, gain=gain)
-        if hasattr(module, 'bias') and module.bias is not None:
-            nn.init.constant_(module.bias, bias)
-    def carafe(x, normed_mask, kernel_size, group=1, up=1):
-            b, c, h, w = x.shape
-            _, m_c, m_h, m_w = normed_mask.shape
-            # print('x', x.shape)
-            # print('normed_mask', normed_mask.shape)
-            # assert m_c == kernel_size ** 2 * up ** 2
-            assert m_h == up * h
-            assert m_w == up * w
-            pad = kernel_size // 2
-            # print(pad)
-            pad_x = F.pad(x, pad=[pad] * 4, mode='reflect')
-            # print(pad_x.shape)
-            unfold_x = F.unfold(pad_x, kernel_size=(kernel_size, kernel_size), stride=1, padding=0)
-            # unfold_x = unfold_x.reshape(b, c, 1, kernel_size, kernel_size, h, w).repeat(1, 1, up ** 2, 1, 1, 1, 1)
-            unfold_x = unfold_x.reshape(b, c * kernel_size * kernel_size, h, w)
-            unfold_x = F.interpolate(unfold_x, scale_factor=up, mode='nearest')
-            # normed_mask = normed_mask.reshape(b, 1, up ** 2, kernel_size, kernel_size, h, w)
-            unfold_x = unfold_x.reshape(b, c, kernel_size * kernel_size, m_h, m_w)
-            normed_mask = normed_mask.reshape(b, 1, kernel_size * kernel_size, m_h, m_w)
-            res = unfold_x * normed_mask
-            # test
-            # res[:, :, 0] = 1
-            # res[:, :, 1] = 2
-            # res[:, :, 2] = 3
-            # res[:, :, 3] = 4
-            res = res.sum(dim=2).reshape(b, c, m_h, m_w)
-            # res = F.pixel_shuffle(res, up)
-            # print(res.shape)
-            # print(res)
-            return res
-    def normal_init(module, mean=0, std=1, bias=0):
-        if hasattr(module, 'weight') and module.weight is not None:
-            nn.init.normal_(module.weight, mean, std)
-        if hasattr(module, 'bias') and module.bias is not None:
-            nn.init.constant_(module.bias, bias)
 def constant_init(module, val, bias=0):
@@ -90,26 +72,12 @@ def resize(input,
     return F.interpolate(input, size, scale_factor, mode, align_corners)
 def hamming2D(M, N):
-    """
-    生成二维Hamming窗
-    参数：
-    - M：窗口的行数
-    - N：窗口的列数
-    返回：
-    - 二维Hamming窗
-    """
-    # 生成水平和垂直方向上的Hamming窗
-    # hamming_x = np.blackman(M)
-    # hamming_x = np.kaiser(M)
     hamming_x = np.hamming(M)
     hamming_y = np.hamming(N)
-    # 通过外积生成二维Hamming窗
     hamming_2d = np.outer(hamming_x, hamming_y)
     return hamming_2d
-class FreqFusion(nn.Module):
     def __init__(self,
                 hr_channels,
                 lr_channels,
@@ -122,14 +90,14 @@ class FreqFusion(nn.Module):
                 compressed_channels=64,
                 align_corners=False,
                 upsample_mode='nearest',
-                feature_resample=False, # use offset generator or not
                 feature_resample_group=4,
-                comp_feat_upsample=True, # use ALPF & AHPF for init upsampling
                 use_high_pass=True,
                 use_low_pass=True,
                 hr_residual=True,
                 semi_conv=True,
-                hamming_window=True, # for regularization, do not matter really
                 feature_resample_norm=True,
                 **kwargs):
         super().__init__()
@@ -142,7 +110,7 @@ class FreqFusion(nn.Module):
         self.compressed_channels = compressed_channels
         self.hr_channel_compressor = nn.Conv2d(hr_channels, self.compressed_channels,1)
         self.lr_channel_compressor = nn.Conv2d(lr_channels, self.compressed_channels,1)
-        self.content_encoder = nn.Conv2d( # ALPF generator
             self.compressed_channels,
             lowpass_kernel ** 2 * self.up_group * self.scale_factor * self.scale_factor,
             self.encoder_kernel,
@@ -178,6 +146,8 @@ class FreqFusion(nn.Module):
             self.register_buffer('hamming_lowpass', torch.FloatTensor([1.0]))
             self.register_buffer('hamming_highpass', torch.FloatTensor([1.0]))
         self.init_weights()
     def init_weights(self):
         for m in self.modules():
@@ -217,6 +187,15 @@ class FreqFusion(nn.Module):
             return self._forward(hr_feat, lr_feat)
     def _forward(self, hr_feat, lr_feat):
         compressed_hr_feat = self.hr_channel_compressor(hr_feat)
         compressed_lr_feat = self.lr_channel_compressor(lr_feat)
         if self.semi_conv:
@@ -250,6 +229,11 @@ class FreqFusion(nn.Module):
                 mask_hr = self.content_encoder2(compressed_x)
         mask_lr = self.kernel_normalizer(mask_lr, self.lowpass_kernel, hamming=self.hamming_lowpass)
         if self.semi_conv:
                 lr_feat = carafe(lr_feat, mask_lr, self.lowpass_kernel, self.up_group, 2)
         else:
@@ -263,24 +247,33 @@ class FreqFusion(nn.Module):
         if self.use_high_pass:
             mask_hr = self.kernel_normalizer(mask_hr, self.highpass_kernel, hamming=self.hamming_highpass)
             hr_feat_hf = hr_feat - carafe(hr_feat, mask_hr, self.highpass_kernel, self.up_group, 1)
             if self.hr_residual:
                 # print('using hr_residual')
                 hr_feat = hr_feat_hf + hr_feat
             else:
                 hr_feat = hr_feat_hf
         if self.feature_resample:
             # print(lr_feat.shape)
             lr_feat = self.dysampler(hr_x=compressed_hr_feat,
                                      lr_x=compressed_lr_feat, feat2sample=lr_feat)
         return  mask_lr, hr_feat, lr_feat
 class LocalSimGuidedSampler(nn.Module):
     """
-    offset generator in FreqFusion
     """
     def __init__(self, in_channels, scale=2, style='lp', groups=4, use_direct_scale=True, kernel_size=1, local_window=3, sim_type='cos', norm=True, direction_feat='sim_concat'):
         super().__init__()
@@ -436,6 +429,6 @@ if __name__ == '__main__':
     hr_feat = torch.rand(1, 128, 512, 512)
     lr_feat = torch.rand(1, 128, 256, 256)
-    model = FreqFusion(hr_channels=128, lr_channels=128)
     mask_lr, hr_feat, lr_feat = model(hr_feat=hr_feat, lr_feat=lr_feat)
     print(mask_lr.shape)

 import warnings
 import numpy as np
+def xavier_init(module: nn.Module,
+                gain: float = 1,
+                bias: float = 0,
+                distribution: str = 'normal') -> None:
+    assert distribution in ['uniform', 'normal']
+    if hasattr(module, 'weight') and module.weight is not None:
+        if distribution == 'uniform':
+            nn.init.xavier_uniform_(module.weight, gain=gain)
+        else:
+            nn.init.xavier_normal_(module.weight, gain=gain)
+    if hasattr(module, 'bias') and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
+def carafe(x, normed_mask, kernel_size, group=1, up=1):
+        b, c, h, w = x.shape
+        _, m_c, m_h, m_w = normed_mask.shape
+        assert m_h == up * h
+        assert m_w == up * w
+        pad = kernel_size // 2
+        pad_x = F.pad(x, pad=[pad] * 4, mode='reflect')
+        unfold_x = F.unfold(pad_x, kernel_size=(kernel_size, kernel_size), stride=1, padding=0)
+        unfold_x = unfold_x.reshape(b, c * kernel_size * kernel_size, h, w)
+        unfold_x = F.interpolate(unfold_x, scale_factor=up, mode='nearest')
+        unfold_x = unfold_x.reshape(b, c, kernel_size * kernel_size, m_h, m_w)
+        normed_mask = normed_mask.reshape(b, 1, kernel_size * kernel_size, m_h, m_w)
+        res = unfold_x * normed_mask
+        res = res.sum(dim=2).reshape(b, c, m_h, m_w)
+        return res
+def normal_init(module, mean=0, std=1, bias=0):
+    if hasattr(module, 'weight') and module.weight is not None:
+        nn.init.normal_(module.weight, mean, std)
+    if hasattr(module, 'bias') and module.bias is not None:
+        nn.init.constant_(module.bias, bias)
 def constant_init(module, val, bias=0):
     return F.interpolate(input, size, scale_factor, mode, align_corners)
 def hamming2D(M, N):
     hamming_x = np.hamming(M)
     hamming_y = np.hamming(N)
     hamming_2d = np.outer(hamming_x, hamming_y)
     return hamming_2d
+class DesneFusion(nn.Module):
     def __init__(self,
                 hr_channels,
                 lr_channels,
                 compressed_channels=64,
                 align_corners=False,
                 upsample_mode='nearest',
+                feature_resample=False,
                 feature_resample_group=4,
+                comp_feat_upsample=True,
                 use_high_pass=True,
                 use_low_pass=True,
                 hr_residual=True,
                 semi_conv=True,
+                hamming_window=True,
                 feature_resample_norm=True,
                 **kwargs):
         super().__init__()
         self.compressed_channels = compressed_channels
         self.hr_channel_compressor = nn.Conv2d(hr_channels, self.compressed_channels,1)
         self.lr_channel_compressor = nn.Conv2d(lr_channels, self.compressed_channels,1)
+        self.content_encoder = nn.Conv2d(
             self.compressed_channels,
             lowpass_kernel ** 2 * self.up_group * self.scale_factor * self.scale_factor,
             self.encoder_kernel,
             self.register_buffer('hamming_lowpass', torch.FloatTensor([1.0]))
             self.register_buffer('hamming_highpass', torch.FloatTensor([1.0]))
         self.init_weights()
+        self.intermediate_results = {}
     def init_weights(self):
         for m in self.modules():
             return self._forward(hr_feat, lr_feat)
     def _forward(self, hr_feat, lr_feat):
+        # <<< 唯一修改的部分：在不影響運算的前提下，儲存特徵 >>>
+        # 每次 forward 開始時清空，避免儲存舊的結果
+        self.intermediate_results.clear()
+        # 1. 儲存原始輸入
+        self.intermediate_results['hr_feat_before'] = hr_feat.clone()
+        self.intermediate_results['lr_feat_before'] = lr_feat.clone()
         compressed_hr_feat = self.hr_channel_compressor(hr_feat)
         compressed_lr_feat = self.lr_channel_compressor(lr_feat)
         if self.semi_conv:
                 mask_hr = self.content_encoder2(compressed_x)
         mask_lr = self.kernel_normalizer(mask_lr, self.lowpass_kernel, hamming=self.hamming_lowpass)
+        # 2. 儲存低頻處理後的特徵
+        lr_feat_after = carafe(lr_feat, mask_lr, self.lowpass_kernel, self.up_group, 2)
+        self.intermediate_results['lr_feat_after'] = lr_feat_after.clone()
         if self.semi_conv:
                 lr_feat = carafe(lr_feat, mask_lr, self.lowpass_kernel, self.up_group, 2)
         else:
         if self.use_high_pass:
             mask_hr = self.kernel_normalizer(mask_hr, self.highpass_kernel, hamming=self.hamming_highpass)
             hr_feat_hf = hr_feat - carafe(hr_feat, mask_hr, self.highpass_kernel, self.up_group, 1)
+            self.intermediate_results['hr_feat_hf_component'] = hr_feat_hf.clone()
             if self.hr_residual:
                 # print('using hr_residual')
                 hr_feat = hr_feat_hf + hr_feat
             else:
                 hr_feat = hr_feat_hf
+            self.intermediate_results['hr_feat_after'] = hr_feat.clone()
+        else:
+            # 如果不處理，也存入對應的值以避免錯誤
+            final_hr_feat = hr_feat
+            self.intermediate_results['hr_feat_hf_component'] = torch.zeros_like(final_hr_feat)
+            self.intermediate_results['hr_feat_after'] = final_hr_feat.clone()
         if self.feature_resample:
             # print(lr_feat.shape)
             lr_feat = self.dysampler(hr_x=compressed_hr_feat,
                                      lr_x=compressed_lr_feat, feat2sample=lr_feat)
+            self.intermediate_results['lr_feat_after'] = lr_feat.clone() # 如果有 dysampler，則更新
         return  mask_lr, hr_feat, lr_feat
 class LocalSimGuidedSampler(nn.Module):
     """
+    offset generator in DesneFusion
     """
     def __init__(self, in_channels, scale=2, style='lp', groups=4, use_direct_scale=True, kernel_size=1, local_window=3, sim_type='cos', norm=True, direction_feat='sim_concat'):
         super().__init__()
     hr_feat = torch.rand(1, 128, 512, 512)
     lr_feat = torch.rand(1, 128, 256, 256)
+    model = DesneFusion(hr_channels=128, lr_channels=128)
     mask_lr, hr_feat, lr_feat = model(hr_feat=hr_feat, lr_feat=lr_feat)
     print(mask_lr.shape)

get_depth_normap.py CHANGED Viewed

@@ -25,22 +25,23 @@ def parse_args():
 def generate_depth_maps(source_root, model_path):
     source_root = Path(source_root)
     origin = source_root / 'origin'
-    to_thermal_list = [origin]
     model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024]).cuda()
     model.load_state_dict(torch.load(model_path, map_location='cpu'))
     model.eval()
-    thermal_path = source_root / 'depth'
     with torch.inference_mode():
-        for to_thermal_item in to_thermal_list:
-            folder_name = to_thermal_item.stem
-            dst_path = thermal_path
             dst_path.mkdir(parents=True, exist_ok=True)
-            bar = tqdm(to_thermal_item.glob('*'))
             for image_path in bar:
                 try:
@@ -50,14 +51,13 @@ def generate_depth_maps(source_root, model_path):
                     depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
                     depth = depth.astype(np.uint8)
-                    print(depth.shape)
                     np.save(f'{dst_path}/{image_path.stem}.npy', depth)
                 except Exception as e:
                     print(e)
                     continue
-    return thermal_path
 def calculate_normal_map(img_path: Path, ksize=5):

 def generate_depth_maps(source_root, model_path):
     source_root = Path(source_root)
     origin = source_root / 'origin'
+    to_depth_list = [origin]
     model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024]).cuda()
     model.load_state_dict(torch.load(model_path, map_location='cpu'))
     model.eval()
+    depth_path = source_root / 'depth'
+    depth_path.mkdir(parents=True, exist_ok=True)
     with torch.inference_mode():
+        for to_depth_item in to_depth_list:
+            folder_name = to_depth_item.stem
+            dst_path = depth_path
             dst_path.mkdir(parents=True, exist_ok=True)
+            bar = tqdm(to_depth_item.glob('*'))
             for image_path in bar:
                 try:
                     depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
                     depth = depth.astype(np.uint8)
                     np.save(f'{dst_path}/{image_path.stem}.npy', depth)
                 except Exception as e:
                     print(e)
                     continue
+    return depth_path
 def calculate_normal_map(img_path: Path, ksize=5):

model.py CHANGED Viewed

@@ -7,7 +7,7 @@ from einops import rearrange, repeat
 import math
 from utils import grid_sample
-from freqfusion import FreqFusion
 #########################################
@@ -1114,7 +1114,7 @@ class ShadowFormer(nn.Module):
-class ShadowFormerFreq(nn.Module):
     def __init__(self, img_size=256, in_chans=3,
                 embed_dim=32, depths=[2, 2, 2, 2, 2, 2, 2, 2, 2], num_heads=[1, 2, 4, 8, 16, 16, 8, 4, 2],
                 win_size=8, mlp_ratio=4., qkv_bias=True, qk_scale=None,
@@ -1265,13 +1265,13 @@ class ShadowFormerFreq(nn.Module):
         self.relu = nn.LeakyReLU()
         self.apply(self._init_weights)
-        self.freqfusion1 = FreqFusion(hr_channels=256,
                                       lr_channels=512)
-        self.freqfusion2 = FreqFusion(hr_channels=128,
                                       lr_channels=256)
-        self.freqfusion3 = FreqFusion(hr_channels=64,
                                       lr_channels=128)
     def _init_weights(self, m):
@@ -1362,7 +1362,7 @@ class ShadowFormerFreq(nn.Module):
         deconv0_B_C_H_W = deconv0.view(deconv0.shape[0], int(deconv0.shape[1]**0.5), int(deconv0.shape[1]**0.5), 256).permute(0, 3, 1, 2)
         # print(f'1.{deconv0_B_C_H_W.shape=}')  # 1, 256, 64, 64
-        _, deconv0_B_C_H_W, lr_feat = self.freqfusion1(hr_feat=deconv0_B_C_H_W, lr_feat=conv3_B_C_H_W)  # 1, 256, 64, 64 & 1, 512, 32, 32
         # print(f'1.{deconv0.shape=}, {lr_feat.shape=}')  # deconv0.shape=torch.Size([1, 256, 64, 64]), lr_feat.shape=torch.Size([1, 512, 64, 64])
         deconv0 = deconv0_B_C_H_W.view(deconv0_B_C_H_W.shape[0], 256, -1).permute(0, 2, 1)
@@ -1382,7 +1382,7 @@ class ShadowFormerFreq(nn.Module):
         deconv1_B_C_H_W = deconv1.view(deconv1.shape[0], int(deconv1.shape[1]**0.5), int(deconv1.shape[1]**0.5), 128).permute(0, 3, 1, 2)
         # print(f'2.{deconv1_B_C_H_W.shape=}')  # 1, 128, 128, 128
-        _, deconv1_B_C_H_W, lr_feat = self.freqfusion2(hr_feat=deconv1_B_C_H_W, lr_feat=deconv0_B_C_H_W)  # 1, 128, 128, 128 & 1, 256, 64, 64
         # print(f'2.{deconv1_B_C_H_W.shape=}, {lr_feat.shape=}')  # hr_feat.shape=torch.Size([1, 128, 128, 128]), lr_feat.shape=torch.Size([1, 256, 128, 128])
@@ -1403,7 +1403,7 @@ class ShadowFormerFreq(nn.Module):
         deconv2_B_C_H_W = deconv2.view(deconv2.shape[0], int(deconv2.shape[1]**0.5), int(deconv2.shape[1]**0.5), 64).permute(0, 3, 1, 2)
         # print(f'3.{deconv2_B_C_H_W.shape=}')
-        _, deconv2_B_C_H_W, lr_feat = self.freqfusion3(hr_feat=deconv2_B_C_H_W, lr_feat=deconv1_B_C_H_W)  # 1, 64, 256, 256 & 1, 128, 128, 128
         # print('*'*5, f'3.{deconv2_B_C_H_W.shape=}, {lr_feat.shape=}')

 import math
 from utils import grid_sample
+from densefusion import DesneFusion
 #########################################
+class DenseSR(nn.Module):
     def __init__(self, img_size=256, in_chans=3,
                 embed_dim=32, depths=[2, 2, 2, 2, 2, 2, 2, 2, 2], num_heads=[1, 2, 4, 8, 16, 16, 8, 4, 2],
                 win_size=8, mlp_ratio=4., qkv_bias=True, qk_scale=None,
         self.relu = nn.LeakyReLU()
         self.apply(self._init_weights)
+        self.densefusion1 = DesneFusion(hr_channels=256,
                                       lr_channels=512)
+        self.densefusion2 = DesneFusion(hr_channels=128,
                                       lr_channels=256)
+        self.densefusion3 = DesneFusion(hr_channels=64,
                                       lr_channels=128)
     def _init_weights(self, m):
         deconv0_B_C_H_W = deconv0.view(deconv0.shape[0], int(deconv0.shape[1]**0.5), int(deconv0.shape[1]**0.5), 256).permute(0, 3, 1, 2)
         # print(f'1.{deconv0_B_C_H_W.shape=}')  # 1, 256, 64, 64
+        _, deconv0_B_C_H_W, lr_feat = self.densefusion1(hr_feat=deconv0_B_C_H_W, lr_feat=conv3_B_C_H_W)  # 1, 256, 64, 64 & 1, 512, 32, 32
         # print(f'1.{deconv0.shape=}, {lr_feat.shape=}')  # deconv0.shape=torch.Size([1, 256, 64, 64]), lr_feat.shape=torch.Size([1, 512, 64, 64])
         deconv0 = deconv0_B_C_H_W.view(deconv0_B_C_H_W.shape[0], 256, -1).permute(0, 2, 1)
         deconv1_B_C_H_W = deconv1.view(deconv1.shape[0], int(deconv1.shape[1]**0.5), int(deconv1.shape[1]**0.5), 128).permute(0, 3, 1, 2)
         # print(f'2.{deconv1_B_C_H_W.shape=}')  # 1, 128, 128, 128
+        _, deconv1_B_C_H_W, lr_feat = self.densefusion2(hr_feat=deconv1_B_C_H_W, lr_feat=deconv0_B_C_H_W)  # 1, 128, 128, 128 & 1, 256, 64, 64
         # print(f'2.{deconv1_B_C_H_W.shape=}, {lr_feat.shape=}')  # hr_feat.shape=torch.Size([1, 128, 128, 128]), lr_feat.shape=torch.Size([1, 256, 128, 128])
         deconv2_B_C_H_W = deconv2.view(deconv2.shape[0], int(deconv2.shape[1]**0.5), int(deconv2.shape[1]**0.5), 64).permute(0, 3, 1, 2)
         # print(f'3.{deconv2_B_C_H_W.shape=}')
+        _, deconv2_B_C_H_W, lr_feat = self.densefusion3(hr_feat=deconv2_B_C_H_W, lr_feat=deconv1_B_C_H_W)  # 1, 64, 256, 256 & 1, 128, 128, 128
         # print('*'*5, f'3.{deconv2_B_C_H_W.shape=}, {lr_feat.shape=}')

test_shadow.py CHANGED Viewed

@@ -9,10 +9,8 @@ from torch.utils.data import DataLoader
 from torch.nn.parallel import DistributedDataParallel as DDP
 import torch.nn.functional as F
 import random
-# from utils.loader import get_validation_data
 from utils.loader import get_test_data
 import utils
-import cv2
 import torch.distributed as dist
 from skimage.metrics import peak_signal_noise_ratio as psnr_loss
 from skimage.metrics import structural_similarity as ssim_loss
@@ -21,10 +19,9 @@ parser.add_argument('--input_dir', default='test_dir',
     type=str, help='Directory of validation images')
 parser.add_argument('--result_dir', default='./output_dir',
     type=str, help='Directory for results')
-parser.add_argument('--weights', default='ACVLab_shadow.pth'
                     ,type=str, help='Path to weights')
-# parser.add_argument('--arch', default='ShadowFormer', type=str, help='arch')
-parser.add_argument('--arch', type=str, default='ShadowFormerFreq', help='archtechture')
 parser.add_argument('--batch_size', default=1, type=int, help='Batch size for dataloader')
 parser.add_argument('--save_images', action='store_true', default=False, help='Save denoised images in result directory')
 parser.add_argument('--cal_metrics', action='store_true', default=False, help='Measure denoised images with GT')
@@ -51,49 +48,38 @@ class SlidingWindowInference:
         self.img_multiple_of = img_multiple_of
     def _pad_input(self, x, h_pad, w_pad):
-        """Handle padding using reflection padding"""
         return F.pad(x, (0, w_pad, 0, h_pad), 'reflect')
     def __call__(self, model, input_, point, normal, dino_net, device):
-        # Save original dimensions
         original_height, original_width = input_.shape[2], input_.shape[3]
-        # print(f"Original size: {original_height}x{original_width}")
-        # Calculate minimum dimensions needed (at least window_size and multiple of img_multiple_of)
         H = max(self.window_size,
                ((original_height + self.img_multiple_of - 1) // self.img_multiple_of) * self.img_multiple_of)
         W = max(self.window_size,
                ((original_width + self.img_multiple_of - 1) // self.img_multiple_of) * self.img_multiple_of)
-        # print(f"Target padded size: {H}x{W}")
-        # Calculate required padding
         padh = H - original_height
         padw = W - original_width
-        # print(f"Padding: h={padh}, w={padw}")
         # Pad all inputs
         input_pad = self._pad_input(input_, padh, padw)
         point_pad = self._pad_input(point, padh, padw)
         normal_pad = self._pad_input(normal, padh, padw)
-        # If image was smaller than window_size, process it as a single window
         if original_height <= self.window_size and original_width <= self.window_size:
-            # print("Image smaller than window size, processing as single padded window")
-            # For DINO features
             DINO_patch_size = 14
             h_size = H * DINO_patch_size // 8
             w_size = W * DINO_patch_size // 8
             UpSample_window = torch.nn.UpsamplingBilinear2d(size=(h_size, w_size))
-            # Get DINO features
             with torch.no_grad():
                 input_DINO = UpSample_window(input_pad)
                 dino_features = dino_net.module.get_intermediate_layers(input_DINO, 4, True)
             # Model inference
-            with torch.cuda.amp.autocast():
                 restored = model(input_pad, dino_features, point_pad, normal_pad)
             # Crop back to original size
@@ -104,7 +90,6 @@ class SlidingWindowInference:
         stride = self.window_size - self.overlap
         h_steps = (H - self.window_size + stride - 1) // stride + 1
         w_steps = (W - self.window_size + stride - 1) // stride + 1
-        # print(f"Steps: h={h_steps}, w={w_steps}")
         # Create output tensor and counter
         output = torch.zeros_like(input_pad)
@@ -123,8 +108,6 @@ class SlidingWindowInference:
                 point_window = point_pad[:, :, h_start:h_end, w_start:w_end]
                 normal_window = normal_pad[:, :, h_start:h_end, w_start:w_end]
-                # print(f"Processing window at ({h_idx}, {w_idx}): {input_window.shape}")
                 # For DINO features
                 DINO_patch_size = 14
                 h_size = self.window_size * DINO_patch_size // 8
@@ -138,7 +121,7 @@ class SlidingWindowInference:
                     dino_features = dino_net.module.get_intermediate_layers(input_DINO, 4, True)
                 # Model inference
-                with torch.cuda.amp.autocast():
                     restored = model(input_window, dino_features, point_window, normal_window)
                 # Create weight mask for smooth transition
@@ -180,7 +163,7 @@ g = torch.Generator()
 g.manual_seed(1234)
 torch.backends.cudnn.benchmark = True
-# torch.backends.cudnn.deterministic = True
 ######### Model ###########
 model_restoration = utils.get_arch(args)
 model_restoration.to(device)
@@ -218,38 +201,19 @@ with torch.no_grad():
     ssim_val_rgb_list = []
     rmse_val_rgb_list = []
     for ii, data_test in enumerate(tqdm(test_loader), 0):
-            # rgb_gt = data_test[0].numpy().squeeze().transpose((1, 2, 0))
             rgb_noisy = data_test[1].to(device)
             point = data_test[2].to(device)
             normal = data_test[3].to(device)
             filenames = data_test[4]
-            # Pad the input if not_multiple_of win_size * 8
-            # height, width = rgb_noisy.shape[2], rgb_noisy.shape[3]
-            # H, W = ((height + img_multiple_of) // img_multiple_of) * img_multiple_of, (
-            #     (width + img_multiple_of) // img_multiple_of) * img_multiple_of
-            # padh = H - height if height % img_multiple_of != 0 else 0
-            # padw = W - width if width % img_multiple_of != 0 else 0
-            # rgb_noisy = F.pad(rgb_noisy, (0, padw, 0, padh), 'reflect')
-            # point = F.pad(point, (0, padw, 0, padh), 'reflect')
-            # normal = F.pad(normal, (0, padw, 0, padh), 'reflect')
-            # print(f'{rgb_noisy.shape=} {point.shape=} {normal.shape=}')
-            # UpSample_val = nn.UpsamplingBilinear2d(
-            #     size=((int)(rgb_noisy.shape[2] * (DINO_patch_size / 8)),
-            #         (int)(rgb_noisy.shape[3] * (DINO_patch_size / 8))))
-            # with torch.cuda.amp.autocast():
-            #     # DINO_V2
-            #     input_DINO = UpSample_val(rgb_noisy)
-            #     dino_mat_features = DINO_Net.module.get_intermediate_layers(input_DINO, 4, True)
-            #     rgb_restored = model_restoration(rgb_noisy, dino_mat_features, point, normal)
             sliding_window = SlidingWindowInference(
-                window_size=512,  # 與訓練相同的 patch size
-                overlap=64,       # 相應調整 overlap
                 img_multiple_of=8 * args.win_size
             )
-            with torch.cuda.amp.autocast():
                 rgb_restored = sliding_window(
                     model=model_restoration,
                     input_=rgb_noisy,
@@ -261,7 +225,6 @@ with torch.no_grad():
             rgb_restored = torch.clamp(rgb_restored, 0.0, 1.0)
-            # rgb_restored = rgb_restored[:, : ,:height, :width]
             rgb_restored = torch.clamp(rgb_restored, 0, 1).cpu().numpy().squeeze().transpose((1, 2, 0))

 from torch.nn.parallel import DistributedDataParallel as DDP
 import torch.nn.functional as F
 import random
 from utils.loader import get_test_data
 import utils
 import torch.distributed as dist
 from skimage.metrics import peak_signal_noise_ratio as psnr_loss
 from skimage.metrics import structural_similarity as ssim_loss
     type=str, help='Directory of validation images')
 parser.add_argument('--result_dir', default='./output_dir',
     type=str, help='Directory for results')
+parser.add_argument('--weights', default='best_model_densefusion.pth'
                     ,type=str, help='Path to weights')
+parser.add_argument('--arch', type=str, default='DenseSR', help='archtechture')
 parser.add_argument('--batch_size', default=1, type=int, help='Batch size for dataloader')
 parser.add_argument('--save_images', action='store_true', default=False, help='Save denoised images in result directory')
 parser.add_argument('--cal_metrics', action='store_true', default=False, help='Measure denoised images with GT')
         self.img_multiple_of = img_multiple_of
     def _pad_input(self, x, h_pad, w_pad):
         return F.pad(x, (0, w_pad, 0, h_pad), 'reflect')
     def __call__(self, model, input_, point, normal, dino_net, device):
         original_height, original_width = input_.shape[2], input_.shape[3]
         H = max(self.window_size,
                ((original_height + self.img_multiple_of - 1) // self.img_multiple_of) * self.img_multiple_of)
         W = max(self.window_size,
                ((original_width + self.img_multiple_of - 1) // self.img_multiple_of) * self.img_multiple_of)
         padh = H - original_height
         padw = W - original_width
         # Pad all inputs
         input_pad = self._pad_input(input_, padh, padw)
         point_pad = self._pad_input(point, padh, padw)
         normal_pad = self._pad_input(normal, padh, padw)
         if original_height <= self.window_size and original_width <= self.window_size:
             DINO_patch_size = 14
             h_size = H * DINO_patch_size // 8
             w_size = W * DINO_patch_size // 8
             UpSample_window = torch.nn.UpsamplingBilinear2d(size=(h_size, w_size))
             with torch.no_grad():
                 input_DINO = UpSample_window(input_pad)
                 dino_features = dino_net.module.get_intermediate_layers(input_DINO, 4, True)
             # Model inference
+            with torch.amp.autocast(device_type='cuda'):
                 restored = model(input_pad, dino_features, point_pad, normal_pad)
             # Crop back to original size
         stride = self.window_size - self.overlap
         h_steps = (H - self.window_size + stride - 1) // stride + 1
         w_steps = (W - self.window_size + stride - 1) // stride + 1
         # Create output tensor and counter
         output = torch.zeros_like(input_pad)
                 point_window = point_pad[:, :, h_start:h_end, w_start:w_end]
                 normal_window = normal_pad[:, :, h_start:h_end, w_start:w_end]
                 # For DINO features
                 DINO_patch_size = 14
                 h_size = self.window_size * DINO_patch_size // 8
                     dino_features = dino_net.module.get_intermediate_layers(input_DINO, 4, True)
                 # Model inference
+                with torch.amp.autocast(device_type='cuda'):
                     restored = model(input_window, dino_features, point_window, normal_window)
                 # Create weight mask for smooth transition
 g.manual_seed(1234)
 torch.backends.cudnn.benchmark = True
 ######### Model ###########
 model_restoration = utils.get_arch(args)
 model_restoration.to(device)
     ssim_val_rgb_list = []
     rmse_val_rgb_list = []
     for ii, data_test in enumerate(tqdm(test_loader), 0):
             rgb_noisy = data_test[1].to(device)
             point = data_test[2].to(device)
             normal = data_test[3].to(device)
             filenames = data_test[4]
             sliding_window = SlidingWindowInference(
+                window_size=512,
+                overlap=64,
                 img_multiple_of=8 * args.win_size
             )
+            with torch.amp.autocast(device_type='cuda'):
                 rgb_restored = sliding_window(
                     model=model_restoration,
                     input_=rgb_noisy,
             rgb_restored = torch.clamp(rgb_restored, 0.0, 1.0)
             rgb_restored = torch.clamp(rgb_restored, 0, 1).cpu().numpy().squeeze().transpose((1, 2, 0))

utils/model_utils.py CHANGED Viewed

@@ -56,7 +56,7 @@ def load_optim(optimizer, weights):
     return lr
 def get_arch(opt):
-    from model import ShadowFormer, ShadowFormerFreq
     arch = opt.arch
     print('You choose '+arch+'...')
@@ -64,8 +64,8 @@ def get_arch(opt):
         model_restoration = ShadowFormer(img_size=opt.train_ps,embed_dim=opt.embed_dim,
                                         win_size=opt.win_size,token_projection=opt.token_projection,
                                         token_mlp=opt.token_mlp)
-    elif arch == 'ShadowFormerFreq':
-        model_restoration = ShadowFormerFreq(img_size=opt.train_ps,embed_dim=opt.embed_dim,
                                         win_size=opt.win_size,token_projection=opt.token_projection,
                                         token_mlp=opt.token_mlp)
     else:

     return lr
 def get_arch(opt):
+    from model import ShadowFormer, DenseSR
     arch = opt.arch
     print('You choose '+arch+'...')
         model_restoration = ShadowFormer(img_size=opt.train_ps,embed_dim=opt.embed_dim,
                                         win_size=opt.win_size,token_projection=opt.token_projection,
                                         token_mlp=opt.token_mlp)
+    elif arch == 'DenseSR':
+        model_restoration = DenseSR(img_size=opt.train_ps,embed_dim=opt.embed_dim,
                                         win_size=opt.win_size,token_projection=opt.token_projection,
                                         token_mlp=opt.token_mlp)
     else: