ViT-Base-Patch16-224-in21k Fine-tuned on Skin Disease Image Dataset Model Description This model is a fine-tuned version of the google/vit-base-patch16-224-in21k Vision Transformer model for image classification tasks. It has been fine-tuned on a custom skin disease image dataset containing 10 classes of skin diseases. Intended Use The model is designed to classify images of skin lesions into one of 10 categories of skin diseases. It can be used for educational purposes, research, or as a starting point for further fine-tuning. Note: This model is not intended for clinical or diagnostic use. Always consult a qualified healthcare professional for medical advice. How to Use Installation Ensure you have the following packages installed: bash
pip install transformers pip install torch pip install torchvision pip install Pillow Loading the Model python
import torch from transformers import ViTForImageClassification, ViTImageProcessor from PIL import Image
Load the fine-tuned model
model_name = 'your_username/your_model_name' # Replace with your actual model path on Hugging Face Hub model = ViTForImageClassification.from_pretrained(model_name) model.eval() device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device)
Load the image processor
image_processor = ViTImageProcessor.from_pretrained(model_name) Making Predictions python
def predict(image_path): # Load and preprocess the image image = Image.open(image_path).convert('RGB') inputs = image_processor(images=image, return_tensors="pt") inputs = {k: v.to(device) for k, v in inputs.items()}
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
predicted_class = model.config.id2label[str(predicted_class_idx)]
return predicted_class
Example usage
image_path = 'path/to/your/image.jpg' # Replace with the path to your image predicted_class = predict(image_path) print(f"Predicted class: {predicted_class}") Labels The model predicts one of the following classes: Eczema Warts Melanoma Atopic Dermatitis Basal Cell Carcinoma Melanocytic Nevi Benign Keratosis-like Lesions Psoriasis Seborrheic Keratoses Fungal Infections Dataset The model was trained on the Skin Diseases Image Dataset available on Kaggle. Dataset Details Number of Classes: 10 Total Images: Approximately 40,000 Classes Included: Eczema Warts, Molluscum, and other Viral Infections Melanoma Atopic Dermatitis Basal Cell Carcinoma Melanocytic Nevi Benign Keratosis-like Lesions Psoriasis, Lichen Planus, and related diseases Seborrheic Keratoses and other Benign Tumors Tinea, Ringworm, Candidiasis, and other Fungal Infections Data Preprocessing Image Size: Resized to 224x224 pixels Normalization: Images are normalized using ImageNet statistics: Mean: [0.485, 0.456, 0.406] Standard Deviation: [0.229, 0.224, 0.225] Data Splits Training Set: 70% Validation Set: 15% Test Set: 15% The data was split in a stratified manner to maintain the class distribution across all splits. Training Procedure Base Model: google/vit-base-patch16-224-in21k Framework: PyTorch with Hugging Face Transformers Optimizer: AdamW Learning Rate: 5e-5 Batch Size: 16 Number of Epochs: 5 Loss Function: Cross-Entropy Loss Training Steps Model Initialization: Loaded the pre-trained ViT model. Adjusted the classifier head to match the number of classes. Data Loading: Created custom datasets for training, validation, and testing. Used DataLoader with appropriate batch sizes. Training Loop: For each epoch: Training Phase: Processed batches of images and labels. Computed loss and performed backpropagation. Validation Phase: Evaluated model performance on the validation set. Evaluation Results Validation Accuracy: Approximately 70% Test Accuracy: Approximately 71% Performance Observations The model shows reasonable performance given the complexity of the task and the number of classes. Further training, hyperparameter tuning, or data augmentation may improve results. Limitations Non-Clinical Use: The model is not suitable for clinical diagnostics. Data Bias: Potential biases due to class imbalance in the dataset. Generalization: The model may not perform well on images outside the dataset domain (e.g., different lighting conditions, image quality). Ethical Considerations Privacy: Ensure that any images used with this model comply with privacy regulations and that patients cannot be identified. Responsibility: Use the model responsibly, acknowledging its limitations and the potential consequences of misclassification. Citation If you use this model, please cite it as: bibtex
@misc{vit_skin_disease_model, title={ViT Fine-tuned on Skin Disease Image Dataset}, author={Your Name}, year={2023}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/your_username/your_model_name}}, } References Vision Transformer (ViT) Paper Hugging Face ViT Documentation