File size: 7,879 Bytes
fcbce65 46e6076 fcbce65 46e6076 990332a 46e6076 990332a 46e6076 990332a 46e6076 990332a 46e6076 990332a e11d1c6 990332a 46e6076 990332a e11d1c6 990332a 46e6076 990332a 46e6076 990332a 46e6076 e11d1c6 990332a e11d1c6 990332a 46e6076 990332a e11d1c6 990332a 46e6076 990332a 46e6076 990332a e11d1c6 990332a e11d1c6 990332a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
---
license: mit
metrics:
- accuracy
- confusion_matrix
- precision
- recall
pipeline_tag: image-classification
library_name: keras
tags:
- medical
---
# CBIS-DDSM-CNN
CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.
The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.
## Model Details
### Model Description
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Lorenzo Maiuri
- **Funded by:** No funds
- **Shared by:** Lorenzo Maiuri
- **Model type:** Image Classification
- **License:** MIT
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN)
- **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset)
- **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images)
- **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn)
- **Demo:** Coming soon...
<!-- - **Demo:** [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) -->
## Uses
### Try It Out
Coming soon...
<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space).
Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score -->
### Direct Use
```python
from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt
# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)
# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
preprocessing_info = json.load(f)
# Define preprocessing function
def load_and_preprocess_image(image_path):
try:
img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if img is None:
raise ValueError(f"Could not read image: {image_path}")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
img_array = img.astype(np.float32) / 255.0
return img_array
except Exception as e:
print(f"Error processing {image_path}: {str(e)}")
return None
# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)
if img_array is not None:
img_batch = np.expand_dims(img_array, axis=0)
predictions = model.predict(img_batch)
cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class
predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"
plt.imshow(img_array)
plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
plt.axis('off')
plt.show()
else:
print("Image loading and preprocessing failed.")
```
### Downstream Use
- Medical Research: Can be used to assist in studying breast cancer detection techniques.
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.
### Out-of-Scope Use
🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.
## Bias, Risks, and Limitations
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.
### Recommendations
- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
- Continuous evaluation: With real-world clinical data before integration into healthcare systems
## Training Details
### Training Data
- Dataset: Breast Histopathology Images
- Image Types: High-resolution mammograms
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
- **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout)
- **Loss Function**: Categorical Cross-Entropy
- **Optimizer**: Adam
- **Validation Split**: 20%
- **Callbacks**: Early Stopping, ReduceLROnPlateau
#### Preprocessing
- Grayscale conversion for reduced complexity
- Contrast enhancement for better lesion visibility
- Image resizing to (50, 50) pixels
- Normalization (scaling pixel values between 0 and 1)
- Data augmentation (flipping, rotation, zooming) to improve generalization
#### Training Hyperparameters
- **Epochs:** 20
- **Batch Size:** 75
- **Learning Rate:** 0.001
- **Optimizer:** Adam
- **Dropout Rate:** 0.4
#### Speeds, Sizes, Times
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- Total Training Time: 33m
- Hardware Used: Tesla P100
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
The model was evaluated on the test split of the CBIS-DDSM dataset
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
The following metrics were computed for evaluation:
- Accuracy
- Confusion Matrix
### Results
- Accuracy: 0.9789
#### Summary
The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** Tesla P100
- **Hours used:** 0.33
- **Cloud Provider:** Kaggle
- **Carbon Emitted:** 0.04
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use this model, please cite it as follows:
```
@misc{CBIS-DDSM-CNN,
author = {Lorenzo Maiuri},
title = {CBIS-DDSM-CNN},
year = {2025},
publisher = {Hugging Face Hub},
license = {MIT}
}
``` |