MVTec LOCO Foreground Segmentation Tool

δΈ­ζ–‡η‰ˆ README | English README

This tool uses UΒ²-Net for generating binary foreground masks from the MVTec LOCO anomaly detection dataset.

Overview

The mvtec_loco_fg_segmentation.py script processes the entire MVTec LOCO dataset and generates binary foreground masks for all images. It uses the UΒ²-Net model to perform salient object detection and converts the probability maps to binary masks.

Features

  • Complete Dataset Processing: Processes all categories (breakfast_box, screw_bag, juice_bottle, splicing_connectors, pushpins)
  • Flexible Structure: Handles both test and train splits with all subdirectories (good, logical_anomalies, structural_anomalies)
  • Binary Mask Output: Generates clean binary masks (0/255) in L mode (grayscale)
  • Configurable Parameters: Customizable threshold, categories, splits, and processing options
  • GPU/CPU Support: Automatic detection and utilization of available hardware

Requirements

Environment Setup

# Create conda environment
conda create -n u2net python=3.8 -y
conda activate u2net

# Install dependencies
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --index-url https://download.pytorch.org/whl/cu116
pip install opencv-python scikit-image matplotlib numpy pillow

Model Weights

Option 1: Automatic Download (Recommended)

# Install HuggingFace Hub
pip install huggingface_hub

# The model will be automatically downloaded when you run the script
python mvtec_loco_fg_segmentation.py

Option 2: Manual Download

  • Download u2net.pth (176.3 MB) from GoogleDrive
  • Place it in: ./saved_models/u2net/u2net.pth

Option 3: Download from HuggingFace

# Download only the model
python download_from_hf.py --model-only

# Or download the complete repository
python download_from_hf.py --complete-repo

Dataset Structure

Ensure your MVTec LOCO dataset follows this structure:

mvtec_loco_anomaly_detection/
β”œβ”€β”€ breakfast_box/
β”‚   β”œβ”€β”€ test/
β”‚   β”‚   β”œβ”€β”€ good/
β”‚   β”‚   β”œβ”€β”€ logical_anomalies/
β”‚   β”‚   └── structural_anomalies/
β”‚   └── train/
β”‚       └── good/
β”œβ”€β”€ screw_bag/
β”‚   β”œβ”€β”€ test/
β”‚   └── train/
└── ... (other categories)

Usage

Basic Usage

# Process entire dataset with default settings
python mvtec_loco_fg_segmentation.py

# Show help
python mvtec_loco_fg_segmentation.py -h

Advanced Usage

# Specify custom dataset and model paths
python mvtec_loco_fg_segmentation.py \
    --dataset_path /path/to/mvtec_loco \
    --model_path /path/to/u2net.pth

# Process specific categories only
python mvtec_loco_fg_segmentation.py \
    --categories breakfast_box juice_bottle

# Process only test split
python mvtec_loco_fg_segmentation.py \
    --splits test

# Use different threshold for binary mask generation
python mvtec_loco_fg_segmentation.py \
    --threshold 0.3

# Custom output directory name
python mvtec_loco_fg_segmentation.py \
    --output_dir custom_masks

# Optimize processing with multiple workers
python mvtec_loco_fg_segmentation.py \
    --num_workers 4 \
    --batch_size 4

Command Line Arguments

Argument Type Default Description
--dataset_path str /root/hy-data/datasets/mvtec_loco_anomaly_detection Path to MVTec LOCO dataset root
--model_path str ./saved_models/u2net/u2net.pth Path to U2NET model weights
--output_dir str fg_mask Output directory name for masks
--threshold float 0.5 Threshold for binary mask generation
--categories list all 5 categories Categories to process
--splits list ['test', 'train'] Dataset splits to process
--batch_size int 1 Batch size for processing
--num_workers int 1 Number of data loading workers

Output Structure

The script generates masks in the following structure:

mvtec_loco_anomaly_detection/
β”œβ”€β”€ fg_mask/                    # Generated masks directory
β”‚   β”œβ”€β”€ breakfast_box/
β”‚   β”‚   β”œβ”€β”€ test/
β”‚   β”‚   β”‚   β”œβ”€β”€ good/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 000.png     # Binary mask (0/255 values)
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 001.png
β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   β”œβ”€β”€ logical_anomalies/
β”‚   β”‚   β”‚   └── structural_anomalies/
β”‚   β”‚   └── train/
β”‚   β”‚       └── good/
β”‚   └── ... (other categories)
└── ... (original dataset)

Mask Properties

  • Format: PNG images
  • Mode: L (grayscale, single channel)
  • Values: Binary (0 for background, 255 for foreground)
  • Size: Same as original images
  • Threshold: Configurable (default 0.5)

Performance Notes

  • GPU Recommended: Processing is significantly faster with CUDA-enabled GPU
  • Memory Usage: Each image requires ~200MB GPU memory during processing
  • Processing Time: ~2-3 seconds per image on modern GPU
  • Total Images: ~5000+ images in complete dataset

Troubleshooting

Common Issues

  1. CUDA Out of Memory: Reduce batch size or use CPU processing
  2. Model Not Found: Ensure u2net.pth is in correct directory
  3. Dataset Path Error: Verify MVTec LOCO dataset structure
  4. Permission Errors: Check write permissions for output directory

Error Messages

  • ERROR: Dataset path not found: Check dataset path and extraction
  • ERROR: Model path not found: Download and place u2net.pth correctly
  • ERROR: Invalid categories: Use valid category names

Examples Output

The script provides detailed progress information:

Configuration:
  Dataset path: /root/hy-data/datasets/mvtec_loco_anomaly_detection
  Model path: ./saved_models/u2net/u2net.pth
  Output directory: fg_mask
  Binary threshold: 0.5
  Categories: ['breakfast_box', 'screw_bag', 'juice_bottle', 'splicing_connectors', 'pushpins']
  Splits: ['test', 'train']

...load U2NET---
Processing category: breakfast_box
  Processing breakfast_box/test/good
    Found 102 images
    Processing 1/102: 000.png
    Processing 20/102: 019.png
    ...

Citation

If you use this tool in your research, please cite the original UΒ²-Net paper:

@InProceedings{Qin_2020_PR,
  title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
  author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
  journal = {Pattern Recognition},
  volume = {106},
  pages = {107404},
  year = {2020}
}

License

This tool extends the original UΒ²-Net implementation. Please refer to the original repository for license information.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support