language: en
license: mit
library_name: transformers
datasets:
- restor/tcd
tags:
- image-segmentation
- segformer
- remote-sensing
- tree-crown-delineation
model-index:
- name: attavit14203638/segformer-b5-tcd-optimized
results:
- task:
type: image-segmentation
dataset:
name: restor/tcd
type: restor/tcd
metrics:
- name: Boundary IoU
type: boundary_iou
value: 0.6201
- name: IoU
type: iou
value: 0.848
- name: F1-Score
type: f1
value: 0.918
Optimized SegFormer (B5) for Tree Crown Delineation
π State-of-the-Art Results: Optimized Standard SegFormer achieves Boundary IoU 0.6201 on OAM-TCD Dataset
This model is the official "Optimized Standard SegFormer" from the ACPR 2025 paper, "Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation." It demonstrates that a well-optimized training strategy is more critical than architectural modifications for achieving superior boundary delineation in tree crown segmentation tasks.
The key to this model's performance is the combination of class weighting and full-resolution (HΓW) loss supervision during training.
π How to Use
This model can be easily loaded from the Hub and used for inference.
First, ensure you have the necessary libraries installed:
pip install transformers torch Pillow
Then, use the following Python snippet to run inference on an image:
import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
import requests
import numpy as np
# Load the image processor and model from the Hub
processor = AutoImageProcessor.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")
model = AutoModelForSemanticSegmentation.from_pretrained("attavit14203638/segformer-b5-tcd-optimized")
# Example: Load an image from a URL
# url = "https://huggingface.co/datasets/restor/tcd/resolve/main/data/images/test/490.png"
# image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Or, load a local image
image = Image.open("path/to/your/image.png").convert("RGB")
# Preprocess the image
inputs = processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process the output to get segmentation map
logits = outputs.logits
# Upsample logits to original image size
upsampled_logits = torch.nn.functional.interpolate(
logits,
size=image.size[::-1], # (height, width)
mode='bilinear',
align_corners=False
)
# Get the predicted class for each pixel
pred_seg = upsampled_logits.argmax(dim=1)[0]
# To visualize, you can convert the segmentation map to a color image
# This example assumes 2 classes: background (0) and tree_crown (1)
color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
color_seg[pred_seg == 1] = [0, 255, 0] # Green for tree crowns
color_seg_img = Image.fromarray(color_seg)
# Blend the segmentation map with the original image for visualization
blended_img = Image.blend(image.convert("RGBA"), color_seg_img.convert("RGBA"), alpha=0.5)
# Save or display the result
blended_img.save("segmentation_result.png")
blended_img.show()
π¬ Model Details
- Architecture: Standard SegFormer with an
nvidia/mit-b5
backbone. - Optimization: Trained with class weighting and full-resolution loss supervision. This means the model's logits are upsampled to the original image dimensions before the loss is calculated, forcing the model to learn fine-grained boundaries.
- Image Processor: The image processor (
preprocessor_config.json
) is configured withdo_resize=False
, ensuring that images are processed at their native resolution.
π Evaluation
The model was trained and evaluated on the restor/tcd
dataset. The key finding of the paper is that this optimized standard SegFormer outperforms more complex architectural variants.
Configuration | IoU | F1-Score | Boundary IoU | Notes |
---|---|---|---|---|
Std SegFormer + CW + HΓW Loss (This Model) | 0.848 | 0.918 | π 0.620 | State-of-the-art |
Std SegFormer + HΓW Loss | 0.828 | 0.906 | 0.610 | Shows HΓW loss impact |
Std SegFormer + CW + H/4 Loss | 0.844 | 0.916 | 0.606 | Strong baseline |
Std SegFormer (baseline) | 0.817 | 0.899 | 0.590 | Standard configuration |
TrueResSegFormer + CW | 0.838 | 0.912 | 0.589 | Architectural variant |
TrueResSegFormer (no CW) | 0.797 | 0.887 | 0.577 | Without optimization |
Training Procedure
The model was trained using the following command from the original repository:
python main.py train \
--dataset_name restor/tcd \
--model_name nvidia/mit-b5 \
--output_dir ./outputs/segformer_b5_cw_full_res \
--num_epochs 50 \
--learning_rate 1e-5 \
--class_weights_enabled \
--apply_loss_at_original_resolution
- Optimizer: AdamW
- Learning Rate:
1e-5
- Epochs: 50
- Batch Size: Varies based on hardware (e.g., 1 per GPU with gradient accumulation)
π Citation
If you use this model or the associated research in your work, please cite the original paper:
@inproceedings{wilaiwongsakul2025empirical,
title={Empirical Insights into Optimizing SegFormer for High-Fidelity Tree Crown Delineation},
author={Wilaiwongsakul, Attavit and Liang, Bin and Jia, Wenfeng and Chen, Fang},
booktitle={Asian Conference on Pattern Recognition (ACPR)},
year={2025}
}