Model Card for Duo Image Classification AutoML

Binary image classifier trained using AutoGluon's AutoML framework with neural network architecture search across multiple CNN backbones.

Model Details

Model Description

This model performs binary image classification using automated machine learning (AutoML) to select the optimal CNN architecture. The model was trained through systematic comparison of ResNet, EfficientNet, and MobileNet variants, with selection based on performance on unseen original data rather than augmented validation data.

  • Developed by: [Your Name/Institution]
  • Model type: Convolutional Neural Network (CNN) via AutoML
  • Language(s): English (image classification - not NLP)
  • License: MIT
  • Finetuned from model: Pre-trained TIMM models (various CNN architectures)

Model Sources

Uses

Direct Use

This model classifies images into two binary categories based on the duo-image dataset. Intended applications include:

  • Educational demonstrations of AutoML techniques
  • Binary image classification for similar visual domains
  • Benchmarking CNN architecture performance on binary tasks

Downstream Use

The model could potentially be fine-tuned for related binary image classification tasks, though performance would depend on visual similarity to the training domain.

Out-of-Scope Use

  • Production systems requiring high reliability without further validation
  • Multi-class classification without retraining
  • Classification of images significantly different from the training domain
  • Applications where model explainability is critical
  • Safety-critical applications

Bias, Risks, and Limitations

Methodological Concerns:

  • Validation accuracy approached 100% across multiple architectures, suggesting potential data leakage or overly simplistic task
  • Model may have learned to recognize augmentation artifacts rather than true visual features
  • High performance across diverse architectures indicates the classification task may not represent real-world complexity

Technical Limitations:

  • Trained on a specific image domain with limited diversity
  • Binary classification only
  • Performance heavily dependent on synthetic data augmentation
  • Potential overfitting to augmented data patterns

Recommendations

Users should focus on the test set performance (95% accuracy) rather than validation metrics. The model should be validated on additional diverse datasets before real-world deployment. Results suggest the dataset may not represent a challenging classification problem, limiting generalizability.

How to Get Started with the Model

import cloudpickle
from huggingface_hub import hf_hub_download
import pandas as pd

# Download and load model
model_path = hf_hub_download(
    repo_id="maryzhang/24679-image-automl-nn-duo-predictor",
    filename="autogluon_best_image_predictor.pkl"
)

with open(model_path, "rb") as f:
    predictor = cloudpickle.load(f)

# Prepare data (DataFrame with 'image' column containing file paths)
test_data = pd.DataFrame({'image': ['path/to/your/image.jpg']})

# Make predictions
predictions = predictor.predict(test_data)
probabilities = predictor.predict_proba(test_data)

print(f"Prediction: {predictions[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")

Training Details

Training Data

Dataset: scottymcgee/duo-image-dataset

  • Training split: 70% of total data (augmented subset)
  • Validation split: 20% of augmented data
  • Test split: 30% of total data (original, non-augmented images)
  • Problem type: Binary classification
  • Preprocessing: Images materialized to disk, automatic preprocessing by AutoGluon

Training Procedure

Preprocessing

Images were extracted from the Hugging Face dataset format and saved to disk as required by AutoGluon MultiModalPredictor. AutoGluon handled all image preprocessing automatically based on the selected CNN architecture.

Training Hyperparameters

  • AutoML Framework: AutoGluon MultiModalPredictor
  • Preset: medium_quality
  • Time budget: 10 minutes per architecture (40 minutes total)
  • Architectures tested: ResNet18, ResNet34, EfficientNet-B0, MobileNetV3-Small
  • Model selection: Based on performance on original (non-augmented) test data
  • Training regime: Mixed precision (handled automatically by AutoGluon)

Speeds, Sizes, Times

  • Total training time: ~40 minutes across 4 architectures
  • Architecture evaluation: 10 minutes per CNN variant
  • Model selection: Automatic based on test performance
  • Hardware: Single GPU training environment

Evaluation

Testing Data, Factors & Metrics

Testing Data

Primary evaluation: Original dataset (30% of total data, non-augmented) Secondary evaluation: Augmented validation set (20% of augmented data)

The original dataset represents the true test performance as it contains unmodified images not seen during training.

Factors

Evaluation considered both augmented and original data performance to detect potential overfitting to synthetic augmentation patterns.

Metrics

  • Primary: Accuracy (appropriate for balanced binary classification)
  • Secondary: F1-score (weighted and binary)
  • Analysis: Per-class precision and recall via classification reports

Results

Test Set Performance (Original Data) - Primary Metric

  • Accuracy: 95.0%
  • Weighted F1: 95.0%
  • Binary F1: 96.0%

Per-class breakdown:

  • Class 0: Precision=0.93, Recall=0.93, F1=0.93
  • Class 1: Precision=0.96, Recall=0.96, F1=0.96

Validation Performance (Augmented Data)

  • Accuracy: ~100% (concerning - see limitations)

Summary

The model achieves strong performance on original test data (95% accuracy) but showed unrealistically high validation performance across multiple architectures. This pattern suggests the validation methodology may have issues, making the test set results more trustworthy for assessing real-world performance.

Environmental Impact

Training was conducted efficiently using AutoGluon's automated approach with a limited time budget.

Technical Specifications

Model Architecture and Objective

Best Architecture: [To be updated based on AutoML results]

  • Input: RGB images (automatic resizing by AutoGluon)
  • Output: Binary classification probabilities
  • Backbone: Pre-trained TIMM model selected via AutoML
  • Objective: Cross-entropy loss for binary classification

Compute Infrastructure

Hardware

  • GPU-accelerated training (CUDA compatible)
  • Sufficient memory for batch processing of images
  • Standard Google Colab environment

Software

  • AutoGluon: 1.4.0+
  • Python: 3.7+
  • PyTorch: Latest compatible version
  • TIMM: For pre-trained CNN backbones
  • Dependencies: pandas, scikit-learn, PIL/Pillow

Citation

BibTeX:

@model{duo_image_automl_2024,
  title={Duo Image Classification via AutoML Neural Architecture Search},
  author={[Your Name]},
  year={2024},
  url={https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor},
  note={Educational AutoML demonstration with methodological considerations}
}

Dataset Citation:

@dataset{scottymcgee_duo_dataset,
  title={Duo Image Dataset},
  author={Scotty McGee},
  year={2024},
  url={https://huggingface.co/datasets/scottymcgee/duo-image-dataset}
}

More Information

This model was developed as part of an educational assignment to explore AutoML techniques for neural network architecture selection in computer vision. The concerning validation results (near-perfect accuracy across architectures) highlight important lessons about:

  • Proper experimental design in machine learning
  • The importance of realistic evaluation methodologies
  • Potential pitfalls in synthetic data augmentation
  • The value of honest reporting in academic work

The 95% test accuracy represents a more realistic assessment of model performance and should be the primary metric for evaluating this work.

Model Card Authors

Mary Zhang

Model Card Contact

[email protected]

AI Usage

Claude used to edit functions and debug code

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maryzhang/24679-image-automl-nn-duo-predictor

Evaluation results