Model Card for Duo Image Classification AutoML
Binary image classifier trained using AutoGluon's AutoML framework with neural network architecture search across multiple CNN backbones.
Model Details
Model Description
This model performs binary image classification using automated machine learning (AutoML) to select the optimal CNN architecture. The model was trained through systematic comparison of ResNet, EfficientNet, and MobileNet variants, with selection based on performance on unseen original data rather than augmented validation data.
- Developed by: [Your Name/Institution]
- Model type: Convolutional Neural Network (CNN) via AutoML
- Language(s): English (image classification - not NLP)
- License: MIT
- Finetuned from model: Pre-trained TIMM models (various CNN architectures)
Model Sources
- Repository: https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor
- Dataset: https://huggingface.co/datasets/scottymcgee/duo-image-dataset
Uses
Direct Use
This model classifies images into two binary categories based on the duo-image dataset. Intended applications include:
- Educational demonstrations of AutoML techniques
- Binary image classification for similar visual domains
- Benchmarking CNN architecture performance on binary tasks
Downstream Use
The model could potentially be fine-tuned for related binary image classification tasks, though performance would depend on visual similarity to the training domain.
Out-of-Scope Use
- Production systems requiring high reliability without further validation
- Multi-class classification without retraining
- Classification of images significantly different from the training domain
- Applications where model explainability is critical
- Safety-critical applications
Bias, Risks, and Limitations
Methodological Concerns:
- Validation accuracy approached 100% across multiple architectures, suggesting potential data leakage or overly simplistic task
- Model may have learned to recognize augmentation artifacts rather than true visual features
- High performance across diverse architectures indicates the classification task may not represent real-world complexity
Technical Limitations:
- Trained on a specific image domain with limited diversity
- Binary classification only
- Performance heavily dependent on synthetic data augmentation
- Potential overfitting to augmented data patterns
Recommendations
Users should focus on the test set performance (95% accuracy) rather than validation metrics. The model should be validated on additional diverse datasets before real-world deployment. Results suggest the dataset may not represent a challenging classification problem, limiting generalizability.
How to Get Started with the Model
import cloudpickle
from huggingface_hub import hf_hub_download
import pandas as pd
# Download and load model
model_path = hf_hub_download(
repo_id="maryzhang/24679-image-automl-nn-duo-predictor",
filename="autogluon_best_image_predictor.pkl"
)
with open(model_path, "rb") as f:
predictor = cloudpickle.load(f)
# Prepare data (DataFrame with 'image' column containing file paths)
test_data = pd.DataFrame({'image': ['path/to/your/image.jpg']})
# Make predictions
predictions = predictor.predict(test_data)
probabilities = predictor.predict_proba(test_data)
print(f"Prediction: {predictions[0]}")
print(f"Probabilities: {probabilities.iloc[0].to_dict()}")
Training Details
Training Data
Dataset: scottymcgee/duo-image-dataset
- Training split: 70% of total data (augmented subset)
- Validation split: 20% of augmented data
- Test split: 30% of total data (original, non-augmented images)
- Problem type: Binary classification
- Preprocessing: Images materialized to disk, automatic preprocessing by AutoGluon
Training Procedure
Preprocessing
Images were extracted from the Hugging Face dataset format and saved to disk as required by AutoGluon MultiModalPredictor. AutoGluon handled all image preprocessing automatically based on the selected CNN architecture.
Training Hyperparameters
- AutoML Framework: AutoGluon MultiModalPredictor
- Preset: medium_quality
- Time budget: 10 minutes per architecture (40 minutes total)
- Architectures tested: ResNet18, ResNet34, EfficientNet-B0, MobileNetV3-Small
- Model selection: Based on performance on original (non-augmented) test data
- Training regime: Mixed precision (handled automatically by AutoGluon)
Speeds, Sizes, Times
- Total training time: ~40 minutes across 4 architectures
- Architecture evaluation: 10 minutes per CNN variant
- Model selection: Automatic based on test performance
- Hardware: Single GPU training environment
Evaluation
Testing Data, Factors & Metrics
Testing Data
Primary evaluation: Original dataset (30% of total data, non-augmented) Secondary evaluation: Augmented validation set (20% of augmented data)
The original dataset represents the true test performance as it contains unmodified images not seen during training.
Factors
Evaluation considered both augmented and original data performance to detect potential overfitting to synthetic augmentation patterns.
Metrics
- Primary: Accuracy (appropriate for balanced binary classification)
- Secondary: F1-score (weighted and binary)
- Analysis: Per-class precision and recall via classification reports
Results
Test Set Performance (Original Data) - Primary Metric
- Accuracy: 95.0%
- Weighted F1: 95.0%
- Binary F1: 96.0%
Per-class breakdown:
- Class 0: Precision=0.93, Recall=0.93, F1=0.93
- Class 1: Precision=0.96, Recall=0.96, F1=0.96
Validation Performance (Augmented Data)
- Accuracy: ~100% (concerning - see limitations)
Summary
The model achieves strong performance on original test data (95% accuracy) but showed unrealistically high validation performance across multiple architectures. This pattern suggests the validation methodology may have issues, making the test set results more trustworthy for assessing real-world performance.
Environmental Impact
Training was conducted efficiently using AutoGluon's automated approach with a limited time budget.
Technical Specifications
Model Architecture and Objective
Best Architecture: [To be updated based on AutoML results]
- Input: RGB images (automatic resizing by AutoGluon)
- Output: Binary classification probabilities
- Backbone: Pre-trained TIMM model selected via AutoML
- Objective: Cross-entropy loss for binary classification
Compute Infrastructure
Hardware
- GPU-accelerated training (CUDA compatible)
- Sufficient memory for batch processing of images
- Standard Google Colab environment
Software
- AutoGluon: 1.4.0+
- Python: 3.7+
- PyTorch: Latest compatible version
- TIMM: For pre-trained CNN backbones
- Dependencies: pandas, scikit-learn, PIL/Pillow
Citation
BibTeX:
@model{duo_image_automl_2024,
title={Duo Image Classification via AutoML Neural Architecture Search},
author={[Your Name]},
year={2024},
url={https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor},
note={Educational AutoML demonstration with methodological considerations}
}
Dataset Citation:
@dataset{scottymcgee_duo_dataset,
title={Duo Image Dataset},
author={Scotty McGee},
year={2024},
url={https://huggingface.co/datasets/scottymcgee/duo-image-dataset}
}
More Information
This model was developed as part of an educational assignment to explore AutoML techniques for neural network architecture selection in computer vision. The concerning validation results (near-perfect accuracy across architectures) highlight important lessons about:
- Proper experimental design in machine learning
- The importance of realistic evaluation methodologies
- Potential pitfalls in synthetic data augmentation
- The value of honest reporting in academic work
The 95% test accuracy represents a more realistic assessment of model performance and should be the primary metric for evaluating this work.
Model Card Authors
Mary Zhang
Model Card Contact
AI Usage
Claude used to edit functions and debug code
Dataset used to train maryzhang/24679-image-automl-nn-duo-predictor
Evaluation results
- Test Accuracy (Original Data) on Duo Image Datasettest set self-reported0.950
- Test F1-Score on Duo Image Datasettest set self-reported0.950