CBIS-DDSM-CNN / README.md
maiurilorenzo's picture
Update README.md
e11d1c6 verified
---
license: mit
metrics:
- accuracy
- confusion_matrix
- precision
- recall
pipeline_tag: image-classification
library_name: keras
tags:
- medical
---
# CBIS-DDSM-CNN
CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.
The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.
## Model Details
### Model Description
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Lorenzo Maiuri
- **Funded by:** No funds
- **Shared by:** Lorenzo Maiuri
- **Model type:** Image Classification
- **License:** MIT
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN)
- **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset)
- **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images)
- **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn)
- **Demo:** Coming soon...
<!-- - **Demo:** [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) -->
## Uses
### Try It Out
Coming soon...
<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space).
Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score -->
### Direct Use
```python
from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt
# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)
# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
preprocessing_info = json.load(f)
# Define preprocessing function
def load_and_preprocess_image(image_path):
try:
img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if img is None:
raise ValueError(f"Could not read image: {image_path}")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
img_array = img.astype(np.float32) / 255.0
return img_array
except Exception as e:
print(f"Error processing {image_path}: {str(e)}")
return None
# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)
if img_array is not None:
img_batch = np.expand_dims(img_array, axis=0)
predictions = model.predict(img_batch)
cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class
predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"
plt.imshow(img_array)
plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
plt.axis('off')
plt.show()
else:
print("Image loading and preprocessing failed.")
```
### Downstream Use
- Medical Research: Can be used to assist in studying breast cancer detection techniques.
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.
### Out-of-Scope Use
🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.
## Bias, Risks, and Limitations
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.
### Recommendations
- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
- Continuous evaluation: With real-world clinical data before integration into healthcare systems
## Training Details
### Training Data
- Dataset: Breast Histopathology Images
- Image Types: High-resolution mammograms
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
- **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout)
- **Loss Function**: Categorical Cross-Entropy
- **Optimizer**: Adam
- **Validation Split**: 20%
- **Callbacks**: Early Stopping, ReduceLROnPlateau
#### Preprocessing
- Grayscale conversion for reduced complexity
- Contrast enhancement for better lesion visibility
- Image resizing to (50, 50) pixels
- Normalization (scaling pixel values between 0 and 1)
- Data augmentation (flipping, rotation, zooming) to improve generalization
#### Training Hyperparameters
- **Epochs:** 20
- **Batch Size:** 75
- **Learning Rate:** 0.001
- **Optimizer:** Adam
- **Dropout Rate:** 0.4
#### Speeds, Sizes, Times
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- Total Training Time: 33m
- Hardware Used: Tesla P100
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
The model was evaluated on the test split of the CBIS-DDSM dataset
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
The following metrics were computed for evaluation:
- Accuracy
- Confusion Matrix
### Results
- Accuracy: 0.9789
#### Summary
The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** Tesla P100
- **Hours used:** 0.33
- **Cloud Provider:** Kaggle
- **Carbon Emitted:** 0.04
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use this model, please cite it as follows:
```
@misc{CBIS-DDSM-CNN,
author = {Lorenzo Maiuri},
title = {CBIS-DDSM-CNN},
year = {2025},
publisher = {Hugging Face Hub},
license = {MIT}
}
```