Stanford Dogs Image Classifiers

This repository contains a collection of pre-trained image classification models evaluated on the Stanford Dogs dataset. The project explores the application of transfer learning with several prominent Convolutional Neural Network (CNN) architectures to distinguish between 120 different dog breeds.

Models

The following architectures were trained and evaluated as part of this collection:

MobileNetV2
MobileNetV3Small
EfficientNetB0
EfficientNetB1

Dataset

The models were trained and evaluated on the Stanford Dogs dataset, which comprises images of 120 different dog breeds. The dataset is commonly used as a benchmark for fine-grained image classification tasks.

Methodology

The models were trained using Transfer learning and Fine-tuning approach, leveraging weights pre-trained on the ImageNet dataset. The training process involved:

Loading and preprocessing the Stanford Dogs dataset.
Applying data augmentation techniques (including flipping, rotation, zoom, brightness, contrast, saturation, and hue adjustments) to improve model robustness.
Utilizing callbacks such as early stopping and learning rate reduction during training.

For a detailed breakdown of the data pipeline, augmentation strategy, and training procedures for each model, please refer to the original Jupyter notebooks in the source repository.

Evaluation

Each model underwent evaluation on the dedicated test set of the Stanford Dogs dataset. The key performance metrics are summarized below:

Model	Parameters	Val-Acc @10ep	Test‐Acc	Macro F1
MobileNetV3Small	1.01 M	41.7%	55%	0.55
MobileNetV2	2.41 M	75.6%	89%	0.89
EfficientNetB0	4.20 M	77.6%	90%	0.90
EfficientNetB1	6.73 M	72.4%	77%	0.77

Note: Val-Acc @10ep refers to the validation accuracy achieved after 10 training epochs. Test-Acc and Macro F1 are reported on the final test set evaluation.

Detailed classification reports (precision, recall, F1-score per class) and confusion matrices for each model can be found in the evaluation outputs of the corresponding notebooks.

Usage

import tensorflow as tf
import numpy as np
import gradio as gr
from tensorflow.keras.preprocessing import image
from PIL import Image
import cv2
import sys
import tensorflow_datasets as tfds

try:
    print("Loading dataset info to get class names...")
    ds_info = tfds.load('stanford_dogs', split='test', with_info=True)[1]
    class_names = ds_info.features['label'].names
    num_classes = ds_info.features['label'].num_classes
    print(f"Successfully loaded class names ({num_classes}).")
except Exception as e:
    print(f"Error loading dataset info: {e}")
    print("Could not retrieve class names. Please ensure 'stanford_dogs' is available.")
    sys.exit(1)

model_path = 'path/to/your/EfficientNetB0_best_model.keras'

try:
    print(f"Loading model from {model_path}...")
    loaded_model = tf.keras.models.load_model(model_path)
    print("Model loaded successfully.")
except Exception as e:
    print(f"Error loading model: {e}")
    sys.exit(1)

img_height, img_width = loaded_model.input_shape[1:3]
print(f"Model expects input size: {img_height}x{img_width}")

def classify_dog_breed(input_image):
    if isinstance(input_image, Image.Image):
        input_image = np.array(input_image)

    if not isinstance(input_image, np.ndarray):
         raise ValueError("Input is not a valid image format (expected numpy array or PIL Image).")

    if input_image.ndim == 2:
        input_image = cv2.cvtColor(input_image, cv2.COLOR_GRAY2RGB)
    elif input_image.shape[-1] == 4:
        input_image = cv2.cvtColor(input_image, cv2.COLOR_RGBA2RGB)

    img_resized = tf.image.resize(input_image, (img_height, img_width))

    if tf.is_tensor(img_resized):
        img_resized_np = img_resized.numpy()
    else:
        img_resized_np = img_resized

    img_batch = np.expand_dims(img_resized_np, axis=0)

    img_processed = tf.keras.applications.efficientnet.preprocess_input(img_batch)

    predictions = loaded_model.predict(img_processed, verbose=0)

    probabilities = predictions[0]

    output_scores = {class_names[i]: float(probabilities[i]) for i in range(len(class_names))}

    return output_scores

image_input = gr.Image(type="pil", label="Upload Dog Photo")

output_label = gr.Label(num_top_classes=5)

iface = gr.Interface(
    fn=classify_dog_breed,
    inputs=image_input,
    outputs=output_label,
    title="Stanford Dogs Breed Classifier",
    description="Upload a photo of a dog and the model will predict its breed.",
)

print("Launching Gradio interface...")
iface.launch()

How to run this code:

Save the code: Copy the Python code block above and save it as a Python file (e.g., predict_dog.py) in your local environment.
Install dependencies: Ensure you have all the necessary libraries installed, including tensorflow-datasets to load the class names. Open your terminal or command prompt and run:
```
pip install gradio tensorflow numpy Pillow opencv-python tensorflow-datasets
```
Replace placeholders:
- Update the model_path variable to the actual file path where your trained .keras model file is located on your system.
Run the script: Execute the Python script from your terminal:
```
python predict_dog.py
```
Access the app: The script will print a local URL (usually http://127.0.0.1:7860). Open this URL in your web browser.
Upload and Predict: Use the simple web interface to upload a photo of a dog. The model will load class names from the dataset info and display the predicted breed and confidence scores.

Source Code and Further Details

The complete source code, including the Jupyter notebooks used for data processing, training, evaluation, and plotting, is available in the following GitHub repository:

kssrikar4\Image-Classifier

This repository also contains:

Scripts for environment setup.
Detailed training and validation loss/accuracy plots for each model architecture.
Outputs from test-set evaluations.

Short prediction preview of EfficientNetB0 Model

Training & Evaluation Plots

MobileNetV3 Plots

MobileNetV2 Plots

EfficientNetB0 Plots

EfficientNetB1 Plots

Feel free to explore the notebooks to understand the implementation details and reproduce the results.