MVTec LOCO Foreground Segmentation Tool

This tool uses U²-Net for generating binary foreground masks from the MVTec LOCO anomaly detection dataset.

Overview

The mvtec_loco_fg_segmentation.py script processes the entire MVTec LOCO dataset and generates binary foreground masks for all images. It uses the U²-Net model to perform salient object detection and converts the probability maps to binary masks.

Features

Complete Dataset Processing: Processes all categories (breakfast_box, screw_bag, juice_bottle, splicing_connectors, pushpins)
Flexible Structure: Handles both test and train splits with all subdirectories (good, logical_anomalies, structural_anomalies)
Binary Mask Output: Generates clean binary masks (0/255) in L mode (grayscale)
Configurable Parameters: Customizable threshold, categories, splits, and processing options
GPU/CPU Support: Automatic detection and utilization of available hardware

Requirements

Environment Setup

# Create conda environment
conda create -n u2net python=3.8 -y
conda activate u2net

# Install dependencies
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --index-url https://download.pytorch.org/whl/cu116
pip install opencv-python scikit-image matplotlib numpy pillow

Model Weights

Option 1: Automatic Download (Recommended)

# Install HuggingFace Hub
pip install huggingface_hub

# The model will be automatically downloaded when you run the script
python mvtec_loco_fg_segmentation.py

Option 2: Manual Download

Download u2net.pth (176.3 MB) from GoogleDrive
Place it in: ./saved_models/u2net/u2net.pth

Option 3: Download from HuggingFace

# Download only the model
python download_from_hf.py --model-only

# Or download the complete repository
python download_from_hf.py --complete-repo

Dataset Structure

Ensure your MVTec LOCO dataset follows this structure:

mvtec_loco_anomaly_detection/
├── breakfast_box/
│   ├── test/
│   │   ├── good/
│   │   ├── logical_anomalies/
│   │   └── structural_anomalies/
│   └── train/
│       └── good/
├── screw_bag/
│   ├── test/
│   └── train/
└── ... (other categories)

Usage

Basic Usage

# Process entire dataset with default settings
python mvtec_loco_fg_segmentation.py

# Show help
python mvtec_loco_fg_segmentation.py -h

Advanced Usage

# Specify custom dataset and model paths
python mvtec_loco_fg_segmentation.py \
    --dataset_path /path/to/mvtec_loco \
    --model_path /path/to/u2net.pth

# Process specific categories only
python mvtec_loco_fg_segmentation.py \
    --categories breakfast_box juice_bottle

# Process only test split
python mvtec_loco_fg_segmentation.py \
    --splits test

# Use different threshold for binary mask generation
python mvtec_loco_fg_segmentation.py \
    --threshold 0.3

# Custom output directory name
python mvtec_loco_fg_segmentation.py \
    --output_dir custom_masks

# Optimize processing with multiple workers
python mvtec_loco_fg_segmentation.py \
    --num_workers 4 \
    --batch_size 4

Command Line Arguments

Argument	Type	Default	Description
`--dataset_path`	str	`/root/hy-data/datasets/mvtec_loco_anomaly_detection`	Path to MVTec LOCO dataset root
`--model_path`	str	`./saved_models/u2net/u2net.pth`	Path to U2NET model weights
`--output_dir`	str	`fg_mask`	Output directory name for masks
`--threshold`	float	`0.5`	Threshold for binary mask generation
`--categories`	list	`all 5 categories`	Categories to process
`--splits`	list	`['test', 'train']`	Dataset splits to process
`--batch_size`	int	`1`	Batch size for processing
`--num_workers`	int	`1`	Number of data loading workers

Output Structure

The script generates masks in the following structure:

mvtec_loco_anomaly_detection/
├── fg_mask/                    # Generated masks directory
│   ├── breakfast_box/
│   │   ├── test/
│   │   │   ├── good/
│   │   │   │   ├── 000.png     # Binary mask (0/255 values)
│   │   │   │   ├── 001.png
│   │   │   │   └── ...
│   │   │   ├── logical_anomalies/
│   │   │   └── structural_anomalies/
│   │   └── train/
│   │       └── good/
│   └── ... (other categories)
└── ... (original dataset)

Mask Properties

Format: PNG images
Mode: L (grayscale, single channel)
Values: Binary (0 for background, 255 for foreground)
Size: Same as original images
Threshold: Configurable (default 0.5)

Performance Notes

GPU Recommended: Processing is significantly faster with CUDA-enabled GPU
Memory Usage: Each image requires ~200MB GPU memory during processing
Processing Time: ~2-3 seconds per image on modern GPU
Total Images: ~5000+ images in complete dataset

Troubleshooting

Common Issues

CUDA Out of Memory: Reduce batch size or use CPU processing
Model Not Found: Ensure u2net.pth is in correct directory
Dataset Path Error: Verify MVTec LOCO dataset structure
Permission Errors: Check write permissions for output directory

Error Messages

ERROR: Dataset path not found: Check dataset path and extraction
ERROR: Model path not found: Download and place u2net.pth correctly
ERROR: Invalid categories: Use valid category names

Examples Output

The script provides detailed progress information:

Configuration:
  Dataset path: /root/hy-data/datasets/mvtec_loco_anomaly_detection
  Model path: ./saved_models/u2net/u2net.pth
  Output directory: fg_mask
  Binary threshold: 0.5
  Categories: ['breakfast_box', 'screw_bag', 'juice_bottle', 'splicing_connectors', 'pushpins']
  Splits: ['test', 'train']

...load U2NET---
Processing category: breakfast_box
  Processing breakfast_box/test/good
    Found 102 images
    Processing 1/102: 000.png
    Processing 20/102: 019.png
    ...

Citation

If you use this tool in your research, please cite the original U²-Net paper:

@InProceedings{Qin_2020_PR,
  title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
  author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
  journal = {Pattern Recognition},
  volume = {106},
  pages = {107404},
  year = {2020}
}

License

This tool extends the original U²-Net implementation. Please refer to the original repository for license information.