Model Card for Duo Image Classification AutoML

Binary image classifier trained using AutoGluon's AutoML framework with neural network architecture search across multiple CNN backbones.

Model Details

Model Description

This model performs binary image classification using automated machine learning (AutoML) to select the optimal CNN architecture. The model was trained through systematic comparison of ResNet, EfficientNet, and MobileNet variants, with selection based on performance on unseen original data rather than augmented validation data.

Developed by: [Your Name/Institution]
Model type: Convolutional Neural Network (CNN) via AutoML
Language(s): English (image classification - not NLP)
License: MIT
Finetuned from model: Pre-trained TIMM models (various CNN architectures)

Model Sources

Repository: https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor
Dataset: https://huggingface.co/datasets/scottymcgee/duo-image-dataset

Uses

Direct Use

This model classifies images into two binary categories based on the duo-image dataset. Intended applications include:

Educational demonstrations of AutoML techniques
Binary image classification for similar visual domains
Benchmarking CNN architecture performance on binary tasks

Downstream Use

The model could potentially be fine-tuned for related binary image classification tasks, though performance would depend on visual similarity to the training domain.

Out-of-Scope Use

Production systems requiring high reliability without further validation
Multi-class classification without retraining
Classification of images significantly different from the training domain
Applications where model explainability is critical
Safety-critical applications

Bias, Risks, and Limitations

Methodological Concerns:

Validation accuracy approached 100% across multiple architectures, suggesting potential data leakage or overly simplistic task
Model may have learned to recognize augmentation artifacts rather than true visual features
High performance across diverse architectures indicates the classification task may not represent real-world complexity

Technical Limitations:

Trained on a specific image domain with limited diversity
Binary classification only
Performance heavily dependent on synthetic data augmentation
Potential overfitting to augmented data patterns

Recommendations

Users should focus on the test set performance (95% accuracy) rather than validation metrics. The model should be validated on additional diverse datasets before real-world deployment. Results suggest the dataset may not represent a challenging classification problem, limiting generalizability.

How to Get Started with the Model

import cloudpickle
from huggingface_hub import hf_hub_download
import pandas as pd

# Download and load model
model_path = hf_hub_download(
    repo_id="maryzhang/24679-image-automl-nn-duo-predictor",
    filename="autogluon_best_image_predictor.pkl"
)

with open(model_path, "rb") as f:
    predictor = cloudpickle.load(f)

# Prepare data (DataFrame with 'image' column containing file paths)
test_data = pd.DataFrame({'image': ['path/to/your/image.jpg']})

# Make predictions
predictions = predictor.predict(test_data)
probabilities = predictor.predict_proba(test_data)

print(f"Prediction: {predictions[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")

Training Details

Training Data

Dataset: scottymcgee/duo-image-dataset

Training split: 70% of total data (augmented subset)
Validation split: 20% of augmented data
Test split: 30% of total data (original, non-augmented images)
Problem type: Binary classification
Preprocessing: Images materialized to disk, automatic preprocessing by AutoGluon

Training Procedure

Preprocessing

Images were extracted from the Hugging Face dataset format and saved to disk as required by AutoGluon MultiModalPredictor. AutoGluon handled all image preprocessing automatically based on the selected CNN architecture.

Training Hyperparameters

AutoML Framework: AutoGluon MultiModalPredictor
Preset: medium_quality
Time budget: 10 minutes per architecture (40 minutes total)
Architectures tested: ResNet18, ResNet34, EfficientNet-B0, MobileNetV3-Small
Model selection: Based on performance on original (non-augmented) test data
Training regime: Mixed precision (handled automatically by AutoGluon)

Speeds, Sizes, Times

Total training time: ~40 minutes across 4 architectures
Architecture evaluation: 10 minutes per CNN variant
Model selection: Automatic based on test performance
Hardware: Single GPU training environment

Evaluation

Testing Data, Factors & Metrics

Testing Data

Primary evaluation: Original dataset (30% of total data, non-augmented) Secondary evaluation: Augmented validation set (20% of augmented data)

The original dataset represents the true test performance as it contains unmodified images not seen during training.

Factors

Evaluation considered both augmented and original data performance to detect potential overfitting to synthetic augmentation patterns.

Metrics

Primary: Accuracy (appropriate for balanced binary classification)
Secondary: F1-score (weighted and binary)
Analysis: Per-class precision and recall via classification reports

Results

Test Set Performance (Original Data) - Primary Metric

Accuracy: 95.0%
Weighted F1: 95.0%
Binary F1: 96.0%

Per-class breakdown:

Class 0: Precision=0.93, Recall=0.93, F1=0.93
Class 1: Precision=0.96, Recall=0.96, F1=0.96

Validation Performance (Augmented Data)

Accuracy: ~100% (concerning - see limitations)

Summary

The model achieves strong performance on original test data (95% accuracy) but showed unrealistically high validation performance across multiple architectures. This pattern suggests the validation methodology may have issues, making the test set results more trustworthy for assessing real-world performance.

Environmental Impact

Training was conducted efficiently using AutoGluon's automated approach with a limited time budget.

Technical Specifications

Model Architecture and Objective

Best Architecture: [To be updated based on AutoML results]

Input: RGB images (automatic resizing by AutoGluon)
Output: Binary classification probabilities
Backbone: Pre-trained TIMM model selected via AutoML
Objective: Cross-entropy loss for binary classification

Compute Infrastructure

Hardware

GPU-accelerated training (CUDA compatible)
Sufficient memory for batch processing of images
Standard Google Colab environment

Software

AutoGluon: 1.4.0+
Python: 3.7+
PyTorch: Latest compatible version
TIMM: For pre-trained CNN backbones
Dependencies: pandas, scikit-learn, PIL/Pillow

Citation

BibTeX:

@model{duo_image_automl_2024,
  title={Duo Image Classification via AutoML Neural Architecture Search},
  author={[Your Name]},
  year={2024},
  url={https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor},
  note={Educational AutoML demonstration with methodological considerations}
}

Dataset Citation:

@dataset{scottymcgee_duo_dataset,
  title={Duo Image Dataset},
  author={Scotty McGee},
  year={2024},
  url={https://huggingface.co/datasets/scottymcgee/duo-image-dataset}
}

More Information

This model was developed as part of an educational assignment to explore AutoML techniques for neural network architecture selection in computer vision. The concerning validation results (near-perfect accuracy across architectures) highlight important lessons about:

Proper experimental design in machine learning
The importance of realistic evaluation methodologies
Potential pitfalls in synthetic data augmentation
The value of honest reporting in academic work

The 95% test accuracy represents a more realistic assessment of model performance and should be the primary metric for evaluating this work.

Model Card Authors

Mary Zhang

Model Card Contact

[email protected]

AI Usage

Claude used to edit functions and debug code

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train maryzhang/24679-image-automl-nn-duo-predictor

Evaluation results

Test Accuracy (Original Data) on Duo Image Dataset
test set self-reported

0.950
Test F1-Score on Duo Image Dataset
test set self-reported

0.950

View on Papers With Code