Optimized SegFormer (B5) for Tree Crown Delineation

License: MIT Paper GitHub Repository

πŸ† State-of-the-Art Results: Optimized Standard SegFormer achieves Boundary IoU 0.6201 on OAM-TCD Dataset

This model is the official "Optimized Standard SegFormer" from the ACPR 2025 paper, "Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation." It demonstrates that a well-optimized training strategy is more critical than architectural modifications for achieving superior boundary delineation in tree crown segmentation tasks.

The key to this model's performance is the combination of class weighting and full-resolution (HΓ—W) loss supervision during training.

πŸš€ How to Use

This model can be easily loaded from the Hub and used for inference.

First, ensure you have the necessary libraries installed:

pip install transformers torch Pillow

Then, use the following Python snippet to run inference on an image:

import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
import requests
import numpy as np

# Load the image processor and model from the Hub
processor = AutoImageProcessor.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")
model = AutoModelForSemanticSegmentation.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")

# Example: Load an image from a URL
# url = "https://huggingface.co/datasets/restor/tcd/resolve/main/data/images/test/490.png"
# image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Or, load a local image
image = Image.open("path/to/your/image.png").convert("RGB")

# Preprocess the image
inputs = processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process the output to get segmentation map
logits = outputs.logits
# Upsample logits to original image size
upsampled_logits = torch.nn.functional.interpolate(
    logits,
    size=image.size[::-1], # (height, width)
    mode='bilinear',
    align_corners=False
)
# Get the predicted class for each pixel
pred_seg = upsampled_logits.argmax(dim=1)[0]

# To visualize, you can convert the segmentation map to a color image
# This example assumes 2 classes: background (0) and tree_crown (1)
color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
color_seg[pred_seg == 1] = [0, 255, 0] # Green for tree crowns
color_seg_img = Image.fromarray(color_seg)

# Blend the segmentation map with the original image for visualization
blended_img = Image.blend(image.convert("RGBA"), color_seg_img.convert("RGBA"), alpha=0.5)

# Save or display the result
blended_img.save("segmentation_result.png")
blended_img.show()

πŸ”¬ Model Details

  • Architecture: Standard SegFormer with an nvidia/mit-b5 backbone.
  • Optimization: Trained with class weighting and full-resolution loss supervision. This means the model's logits are upsampled to the original image dimensions before the loss is calculated, forcing the model to learn fine-grained boundaries.
  • Image Processor: The image processor (preprocessor_config.json) is configured with do_resize=False, ensuring that images are processed at their native resolution.

πŸ“ˆ Evaluation

The model was trained and evaluated on the restor/tcd dataset. The key finding of the paper is that this optimized standard SegFormer outperforms more complex architectural variants.

Configuration IoU F1-Score Boundary IoU Notes
Std SegFormer + CW + HΓ—W Loss (This Model) 0.848 0.918 πŸ† 0.620 State-of-the-art
Std SegFormer + HΓ—W Loss 0.828 0.906 0.610 Shows HΓ—W loss impact
Std SegFormer + CW + H/4 Loss 0.844 0.916 0.606 Strong baseline
Std SegFormer (baseline) 0.817 0.899 0.590 Standard configuration
TrueResSegFormer + CW 0.838 0.912 0.589 Architectural variant
TrueResSegFormer (no CW) 0.797 0.887 0.577 Without optimization

Training Procedure

The model was trained using the following command from the original repository:

python main.py train \
    --dataset_name restor/tcd \
    --model_name nvidia/mit-b5 \
    --output_dir ./outputs/segformer_b5_cw_full_res \
    --num_epochs 50 \
    --learning_rate 1e-5 \
    --class_weights_enabled \
    --apply_loss_at_original_resolution
  • Optimizer: AdamW
  • Learning Rate: 1e-5
  • Epochs: 50
  • Batch Size: Varies based on hardware (e.g., 1 per GPU with gradient accumulation)

πŸ“„ Citation

If you use this model or the associated research in your work, please cite the original paper:

@inproceedings{wilaiwongsakul2025empirical,
  title={Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation},
  author={Wilaiwongsakul, Attavit and Liang, Bin and Jia, Wenfeng and Chen, Fang},
  booktitle={Asian Conference on Pattern Recognition (ACPR)},
  year={2025}
}
Downloads last month
13
Safetensors
Model size
84.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train attavit14203638/segformer-b5-tcd-optimized

Evaluation results