Optimized SegFormer (B5) for Tree Crown Delineation

🏆 State-of-the-Art Results: Optimized Standard SegFormer achieves Boundary IoU 0.6201 on OAM-TCD Dataset

This model is the official "Optimized Standard SegFormer" from the ACPR 2025 paper, "Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation." It demonstrates that a well-optimized training strategy is more critical than architectural modifications for achieving superior boundary delineation in tree crown segmentation tasks.

The key to this model's performance is the combination of class weighting and full-resolution (H×W) loss supervision during training.

🚀 How to Use

This model can be easily loaded from the Hub and used for inference.

First, ensure you have the necessary libraries installed:

pip install transformers torch Pillow

Then, use the following Python snippet to run inference on an image:

import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
import requests
import numpy as np

# Load the image processor and model from the Hub
processor = AutoImageProcessor.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")
model = AutoModelForSemanticSegmentation.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")

# Example: Load an image from a URL
# url = "https://huggingface.co/datasets/restor/tcd/resolve/main/data/images/test/490.png"
# image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Or, load a local image
image = Image.open("path/to/your/image.png").convert("RGB")

# Preprocess the image
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process the output to get segmentation map
logits = outputs.logits
# Upsample logits to original image size
upsampled_logits = torch.nn.functional.interpolate(
    logits,
    size=image.size[::-1], # (height, width)
    mode='bilinear',
    align_corners=False
)
# Get the predicted class for each pixel
pred_seg = upsampled_logits.argmax(dim=1)[0]

# To visualize, you can convert the segmentation map to a color image
# This example assumes 2 classes: background (0) and tree_crown (1)
color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
color_seg[pred_seg == 1] = [0, 255, 0] # Green for tree crowns
color_seg_img = Image.fromarray(color_seg)

# Blend the segmentation map with the original image for visualization
blended_img = Image.blend(image.convert("RGBA"), color_seg_img.convert("RGBA"), alpha=0.5)

# Save or display the result
blended_img.save("segmentation_result.png")
blended_img.show()

🔬 Model Details

Architecture: Standard SegFormer with an nvidia/mit-b5 backbone.
Optimization: Trained with class weighting and full-resolution loss supervision. This means the model's logits are upsampled to the original image dimensions before the loss is calculated, forcing the model to learn fine-grained boundaries.
Image Processor: The image processor (preprocessor_config.json) is configured with do_resize=False, ensuring that images are processed at their native resolution.

📈 Evaluation

The model was trained and evaluated on the restor/tcd dataset. The key finding of the paper is that this optimized standard SegFormer outperforms more complex architectural variants.

Configuration	IoU	F1-Score	Boundary IoU	Notes
Std SegFormer + CW + H×W Loss (This Model)	0.848	0.918	🏆 0.620	State-of-the-art
Std SegFormer + H×W Loss	0.828	0.906	0.610	Shows H×W loss impact
Std SegFormer + CW + H/4 Loss	0.844	0.916	0.606	Strong baseline
Std SegFormer (baseline)	0.817	0.899	0.590	Standard configuration
TrueResSegFormer + CW	0.838	0.912	0.589	Architectural variant
TrueResSegFormer (no CW)	0.797	0.887	0.577	Without optimization

Training Procedure

The model was trained using the following command from the original repository:

python main.py train \
    --dataset_name restor/tcd \
    --model_name nvidia/mit-b5 \
    --output_dir ./outputs/segformer_b5_cw_full_res \
    --num_epochs 50 \
    --learning_rate 1e-5 \
    --class_weights_enabled \
    --apply_loss_at_original_resolution

Optimizer: AdamW
Learning Rate: 1e-5
Epochs: 50
Batch Size: Varies based on hardware (e.g., 1 per GPU with gradient accumulation)

📄 Citation

If you use this model or the associated research in your work, please cite the original paper:

@inproceedings{wilaiwongsakul2025empirical,
  title={Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation},
  author={Wilaiwongsakul, Attavit and Liang, Bin and Jia, Wenfeng and Chen, Fang},
  booktitle={Asian Conference on Pattern Recognition (ACPR)},
  year={2025}
}

attavit14203638
/

segformer-b5-tcd-optimized