Model Card: ROYXAI [Vision Transformer + VGG19 + ResNet50 Ensemble with Grad-CAM]

Model Description

This model is an ensemble of three deep learning architectures: Vision Transformer (ViT), VGG19, and ResNet50. The ensemble approach enhances classification performance on medical image datasets related to ocular diseases. The model also integrates Grad-CAM visualization to highlight regions of interest for better interpretability.

Model Details

  • Model Name: ROYXAI
  • Developed by: Avishek Roy Sparsho
  • Framework: PyTorch
  • Ensemble Method: Bagging
  • Backbone Models: Vision Transformer, VGG19, ResNet50
  • Target Task: Medical Image Classification
  • Supported Classes:
    • OT
    • Healthy
    • SC_diabetes
    • SC_cataract
    • SC_glucoma

Model Sources

Uses

Direct Use

This model is designed for medical image classification to detect and Visualize ocular diseases and its secondary complications.

Downstream Use

Can be fine-tuned on different medical datasets to improve performance for specific conditions.

Out-of-Scope Use

Not suitable for non-medical image classification tasks or use as a standalone medical diagnostic tool.

Bias, Risks, and Limitations

  • This model is trained on a specific dataset and may not generalize well to other medical image datasets without fine-tuning.
  • It is not a substitute for professional medical diagnosis.
  • The Vision Transformer model is computationally expensive compared to CNNs.

Training Details

Dataset

  • Dataset Name: Custom Ocular Disease and its Secondary complications Dataset
  • Dataset Source: Private Dataset (Medical Images)
  • Dataset Structure: Images stored in folders based on class labels
  • Preprocessing:
    • Resized images to 224x224 pixels
    • Normalized using ImageNet mean and standard deviation

Training Procedure

  • Optimizer: Adam with weight decay
  • Learning Rate Scheduler: Cosine Annealing LR
  • Loss Function: Cross-Entropy Loss
  • Batch Size: 32
  • Training Epochs: 20
  • Hardware Used: T4 GPU x2

Model Performance

  • Accuracy: 98% on the test dataset
  • Precision/Recall/F1-score: Evaluated and optimized for medical diagnosis
  • Overfitting Prevention: Implemented data augmentation, dropout, weight regularization

Installation and Usage

Clone the Repository

git clone https://huggingface.co/Aviroy/ROYXAI
cd ROYXAI

Install Dependencies

pip install -r requirements.txt

Training the Model

To train the model from scratch, run:

python train.py --epochs 50 --batch_size 32

Load Pretrained Model

To directly use the trained model:

import torch
from PIL import Image
import torchvision.transforms as transforms
from model import ensemble_model  # Load the trained ensemble model

# Define image transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load and preprocess an image
image_path = "path/to/image.jpg"
image = Image.open(image_path).convert('RGB')
image = transform(image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')

# Perform inference
ensemble_model.eval()
with torch.no_grad():
    output = ensemble_model(image)
    predicted_class = torch.argmax(output, dim=1).item()

# Print classification result
print("Predicted Class:", predicted_class)

Grad-CAM Visualization

Visualizing Attention Maps for Interpretability

Vision Transformer (ViT)

from visualization import visualize_gradcam_vit  # Function for ViT Grad-CAM

# Generate Grad-CAM visualization
overlay = visualize_gradcam_vit(ensemble_model.models[0], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for Vision Transformer")
plt.show()

ResNet50

from visualization import visualize_gradcam  # General Grad-CAM function

# Generate Grad-CAM visualization for ResNet50
overlay = visualize_gradcam(ensemble_model.models[2], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for ResNet50")
plt.show()

VGG19

from visualization import visualize_gradcam  # General Grad-CAM function

# Generate Grad-CAM visualization for VGG19
overlay = visualize_gradcam(ensemble_model.models[1], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for VGG19")
plt.show()

Environmental Impact

  • Hardware Type: T4 GPU x2
  • Hours used: 50
  • Cloud Provider: Google Cloud (GCP)
  • Compute Region: US-Central1
  • Carbon Emitted: Estimated using Machine Learning Impact Calculator

Citation

If you use this model in your research, please cite:

Citation

If you use this model in your research, please cite:

@article{Sparsho2025,
  author    = {Avishek Roy Sparsho},
  title     = {ROYXAI Model For Proper Visualization of Classified Medical Image},
  journal   = {Medical AI Research},
  year      = {2025}
}

Acknowledgments

Special thanks to the open-source community and Kaggle for providing medical datasets for deep learning research.

Contact

For inquiries, please contact: Avishek Roy Sparsho

License

This model is released under the Apache 2.0 License. Use it responsibly.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Aviroy/ROYXAI

Finetuned
(758)
this model