Stanford Dogs Image Classifiers
This repository contains a collection of pre-trained image classification models evaluated on the Stanford Dogs dataset. The project explores the application of transfer learning with several prominent Convolutional Neural Network (CNN) architectures to distinguish between 120 different dog breeds.
Models
The following architectures were trained and evaluated as part of this collection:
- MobileNetV2
- MobileNetV3Small
- EfficientNetB0
- EfficientNetB1
Dataset
The models were trained and evaluated on the Stanford Dogs dataset, which comprises images of 120 different dog breeds. The dataset is commonly used as a benchmark for fine-grained image classification tasks.
Methodology
The models were trained using Transfer learning and Fine-tuning approach, leveraging weights pre-trained on the ImageNet dataset. The training process involved:
- Loading and preprocessing the Stanford Dogs dataset.
- Applying data augmentation techniques (including flipping, rotation, zoom, brightness, contrast, saturation, and hue adjustments) to improve model robustness.
- Utilizing callbacks such as early stopping and learning rate reduction during training.
For a detailed breakdown of the data pipeline, augmentation strategy, and training procedures for each model, please refer to the original Jupyter notebooks in the source repository.
Evaluation
Each model underwent evaluation on the dedicated test set of the Stanford Dogs dataset. The key performance metrics are summarized below:
Model | Parameters | Val-Acc @10ep | Test‐Acc | Macro F1 |
---|---|---|---|---|
MobileNetV3Small | 1.01 M | 41.7% | 55% | 0.55 |
MobileNetV2 | 2.41 M | 75.6% | 89% | 0.89 |
EfficientNetB0 | 4.20 M | 77.6% | 90% | 0.90 |
EfficientNetB1 | 6.73 M | 72.4% | 77% | 0.77 |
Note: Val-Acc @10ep refers to the validation accuracy achieved after 10 training epochs. Test-Acc and Macro F1 are reported on the final test set evaluation.
Detailed classification reports (precision, recall, F1-score per class) and confusion matrices for each model can be found in the evaluation outputs of the corresponding notebooks.
Usage
import tensorflow as tf
import numpy as np
import gradio as gr
from tensorflow.keras.preprocessing import image
from PIL import Image
import cv2
import sys
import tensorflow_datasets as tfds
try:
print("Loading dataset info to get class names...")
ds_info = tfds.load('stanford_dogs', split='test', with_info=True)[1]
class_names = ds_info.features['label'].names
num_classes = ds_info.features['label'].num_classes
print(f"Successfully loaded class names ({num_classes}).")
except Exception as e:
print(f"Error loading dataset info: {e}")
print("Could not retrieve class names. Please ensure 'stanford_dogs' is available.")
sys.exit(1)
model_path = 'path/to/your/EfficientNetB0_best_model.keras'
try:
print(f"Loading model from {model_path}...")
loaded_model = tf.keras.models.load_model(model_path)
print("Model loaded successfully.")
except Exception as e:
print(f"Error loading model: {e}")
sys.exit(1)
img_height, img_width = loaded_model.input_shape[1:3]
print(f"Model expects input size: {img_height}x{img_width}")
def classify_dog_breed(input_image):
if isinstance(input_image, Image.Image):
input_image = np.array(input_image)
if not isinstance(input_image, np.ndarray):
raise ValueError("Input is not a valid image format (expected numpy array or PIL Image).")
if input_image.ndim == 2:
input_image = cv2.cvtColor(input_image, cv2.COLOR_GRAY2RGB)
elif input_image.shape[-1] == 4:
input_image = cv2.cvtColor(input_image, cv2.COLOR_RGBA2RGB)
img_resized = tf.image.resize(input_image, (img_height, img_width))
if tf.is_tensor(img_resized):
img_resized_np = img_resized.numpy()
else:
img_resized_np = img_resized
img_batch = np.expand_dims(img_resized_np, axis=0)
img_processed = tf.keras.applications.efficientnet.preprocess_input(img_batch)
predictions = loaded_model.predict(img_processed, verbose=0)
probabilities = predictions[0]
output_scores = {class_names[i]: float(probabilities[i]) for i in range(len(class_names))}
return output_scores
image_input = gr.Image(type="pil", label="Upload Dog Photo")
output_label = gr.Label(num_top_classes=5)
iface = gr.Interface(
fn=classify_dog_breed,
inputs=image_input,
outputs=output_label,
title="Stanford Dogs Breed Classifier",
description="Upload a photo of a dog and the model will predict its breed.",
)
print("Launching Gradio interface...")
iface.launch()
How to run this code:
- Save the code: Copy the Python code block above and save it as a Python file (e.g.,
predict_dog.py
) in your local environment. - Install dependencies: Ensure you have all the necessary libraries installed, including
tensorflow-datasets
to load the class names. Open your terminal or command prompt and run:pip install gradio tensorflow numpy Pillow opencv-python tensorflow-datasets
- Replace placeholders:
- Update the
model_path
variable to the actual file path where your trained.keras
model file is located on your system.
- Update the
- Run the script: Execute the Python script from your terminal:
python predict_dog.py
- Access the app: The script will print a local URL (usually
http://127.0.0.1:7860
). Open this URL in your web browser. - Upload and Predict: Use the simple web interface to upload a photo of a dog. The model will load class names from the dataset info and display the predicted breed and confidence scores.
Source Code and Further Details
The complete source code, including the Jupyter notebooks used for data processing, training, evaluation, and plotting, is available in the following GitHub repository:
This repository also contains:
- Scripts for environment setup.
- Detailed training and validation loss/accuracy plots for each model architecture.
- Outputs from test-set evaluations.
Short prediction preview of EfficientNetB0 Model
Training & Evaluation Plots
MobileNetV3 Plots
MobileNetV2 Plots
EfficientNetB0 Plots
EfficientNetB1 Plots
Feel free to explore the notebooks to understand the implementation details and reproduce the results.
- Downloads last month
- 16