MVTec LOCO Foreground Segmentation Tool
δΈζη README | English README
This tool uses UΒ²-Net for generating binary foreground masks from the MVTec LOCO anomaly detection dataset.
Overview
The mvtec_loco_fg_segmentation.py
script processes the entire MVTec LOCO dataset and generates binary foreground masks for all images. It uses the UΒ²-Net model to perform salient object detection and converts the probability maps to binary masks.
Features
- Complete Dataset Processing: Processes all categories (breakfast_box, screw_bag, juice_bottle, splicing_connectors, pushpins)
- Flexible Structure: Handles both test and train splits with all subdirectories (good, logical_anomalies, structural_anomalies)
- Binary Mask Output: Generates clean binary masks (0/255) in L mode (grayscale)
- Configurable Parameters: Customizable threshold, categories, splits, and processing options
- GPU/CPU Support: Automatic detection and utilization of available hardware
Requirements
Environment Setup
# Create conda environment
conda create -n u2net python=3.8 -y
conda activate u2net
# Install dependencies
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --index-url https://download.pytorch.org/whl/cu116
pip install opencv-python scikit-image matplotlib numpy pillow
Model Weights
Option 1: Automatic Download (Recommended)
# Install HuggingFace Hub
pip install huggingface_hub
# The model will be automatically downloaded when you run the script
python mvtec_loco_fg_segmentation.py
Option 2: Manual Download
- Download
u2net.pth
(176.3 MB) from GoogleDrive - Place it in:
./saved_models/u2net/u2net.pth
Option 3: Download from HuggingFace
# Download only the model
python download_from_hf.py --model-only
# Or download the complete repository
python download_from_hf.py --complete-repo
Dataset Structure
Ensure your MVTec LOCO dataset follows this structure:
mvtec_loco_anomaly_detection/
βββ breakfast_box/
β βββ test/
β β βββ good/
β β βββ logical_anomalies/
β β βββ structural_anomalies/
β βββ train/
β βββ good/
βββ screw_bag/
β βββ test/
β βββ train/
βββ ... (other categories)
Usage
Basic Usage
# Process entire dataset with default settings
python mvtec_loco_fg_segmentation.py
# Show help
python mvtec_loco_fg_segmentation.py -h
Advanced Usage
# Specify custom dataset and model paths
python mvtec_loco_fg_segmentation.py \
--dataset_path /path/to/mvtec_loco \
--model_path /path/to/u2net.pth
# Process specific categories only
python mvtec_loco_fg_segmentation.py \
--categories breakfast_box juice_bottle
# Process only test split
python mvtec_loco_fg_segmentation.py \
--splits test
# Use different threshold for binary mask generation
python mvtec_loco_fg_segmentation.py \
--threshold 0.3
# Custom output directory name
python mvtec_loco_fg_segmentation.py \
--output_dir custom_masks
# Optimize processing with multiple workers
python mvtec_loco_fg_segmentation.py \
--num_workers 4 \
--batch_size 4
Command Line Arguments
Argument | Type | Default | Description |
---|---|---|---|
--dataset_path |
str | /root/hy-data/datasets/mvtec_loco_anomaly_detection |
Path to MVTec LOCO dataset root |
--model_path |
str | ./saved_models/u2net/u2net.pth |
Path to U2NET model weights |
--output_dir |
str | fg_mask |
Output directory name for masks |
--threshold |
float | 0.5 |
Threshold for binary mask generation |
--categories |
list | all 5 categories |
Categories to process |
--splits |
list | ['test', 'train'] |
Dataset splits to process |
--batch_size |
int | 1 |
Batch size for processing |
--num_workers |
int | 1 |
Number of data loading workers |
Output Structure
The script generates masks in the following structure:
mvtec_loco_anomaly_detection/
βββ fg_mask/ # Generated masks directory
β βββ breakfast_box/
β β βββ test/
β β β βββ good/
β β β β βββ 000.png # Binary mask (0/255 values)
β β β β βββ 001.png
β β β β βββ ...
β β β βββ logical_anomalies/
β β β βββ structural_anomalies/
β β βββ train/
β β βββ good/
β βββ ... (other categories)
βββ ... (original dataset)
Mask Properties
- Format: PNG images
- Mode: L (grayscale, single channel)
- Values: Binary (0 for background, 255 for foreground)
- Size: Same as original images
- Threshold: Configurable (default 0.5)
Performance Notes
- GPU Recommended: Processing is significantly faster with CUDA-enabled GPU
- Memory Usage: Each image requires ~200MB GPU memory during processing
- Processing Time: ~2-3 seconds per image on modern GPU
- Total Images: ~5000+ images in complete dataset
Troubleshooting
Common Issues
- CUDA Out of Memory: Reduce batch size or use CPU processing
- Model Not Found: Ensure u2net.pth is in correct directory
- Dataset Path Error: Verify MVTec LOCO dataset structure
- Permission Errors: Check write permissions for output directory
Error Messages
ERROR: Dataset path not found
: Check dataset path and extractionERROR: Model path not found
: Download and place u2net.pth correctlyERROR: Invalid categories
: Use valid category names
Examples Output
The script provides detailed progress information:
Configuration:
Dataset path: /root/hy-data/datasets/mvtec_loco_anomaly_detection
Model path: ./saved_models/u2net/u2net.pth
Output directory: fg_mask
Binary threshold: 0.5
Categories: ['breakfast_box', 'screw_bag', 'juice_bottle', 'splicing_connectors', 'pushpins']
Splits: ['test', 'train']
...load U2NET---
Processing category: breakfast_box
Processing breakfast_box/test/good
Found 102 images
Processing 1/102: 000.png
Processing 20/102: 019.png
...
Citation
If you use this tool in your research, please cite the original UΒ²-Net paper:
@InProceedings{Qin_2020_PR,
title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
journal = {Pattern Recognition},
volume = {106},
pages = {107404},
year = {2020}
}
License
This tool extends the original UΒ²-Net implementation. Please refer to the original repository for license information.