|
--- |
|
license: mit |
|
metrics: |
|
- accuracy |
|
- confusion_matrix |
|
- precision |
|
- recall |
|
pipeline_tag: image-classification |
|
library_name: keras |
|
tags: |
|
- medical |
|
--- |
|
# CBIS-DDSM-CNN |
|
|
|
CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research. |
|
|
|
The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** Lorenzo Maiuri |
|
- **Funded by:** No funds |
|
- **Shared by:** Lorenzo Maiuri |
|
- **Model type:** Image Classification |
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN) |
|
- **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset) |
|
- **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images) |
|
- **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn) |
|
- **Demo:** Coming soon... |
|
<!-- - **Demo:** [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) --> |
|
|
|
## Uses |
|
|
|
### Try It Out |
|
|
|
Coming soon... |
|
<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space). |
|
Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score --> |
|
|
|
### Direct Use |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import tensorflow as tf |
|
import cv2 |
|
import numpy as np |
|
import json |
|
import matplotlib.pyplot as plt |
|
|
|
# Load model |
|
repo_id = "maiurilorenzo/CBIS-DDSM-CNN" |
|
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5") |
|
model = tf.keras.models.load_model(model_path) |
|
|
|
# Load preprocessing info |
|
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json") |
|
with open(preprocessing_path, "r") as f: |
|
preprocessing_info = json.load(f) |
|
|
|
# Define preprocessing function |
|
def load_and_preprocess_image(image_path): |
|
try: |
|
img = cv2.imread(image_path, cv2.IMREAD_COLOR) |
|
if img is None: |
|
raise ValueError(f"Could not read image: {image_path}") |
|
|
|
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) |
|
img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA) |
|
img_array = img.astype(np.float32) / 255.0 |
|
|
|
return img_array |
|
except Exception as e: |
|
print(f"Error processing {image_path}: {str(e)}") |
|
return None |
|
|
|
# Load and preprocess an example image |
|
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg" |
|
img_array = load_and_preprocess_image(image_path) |
|
|
|
if img_array is not None: |
|
img_batch = np.expand_dims(img_array, axis=0) |
|
predictions = model.predict(img_batch) |
|
|
|
cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class |
|
predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal" |
|
|
|
plt.imshow(img_array) |
|
plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}') |
|
plt.axis('off') |
|
plt.show() |
|
else: |
|
print("Image loading and preprocessing failed.") |
|
``` |
|
|
|
### Downstream Use |
|
- Medical Research: Can be used to assist in studying breast cancer detection techniques. |
|
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use). |
|
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging |
|
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging. |
|
|
|
### Out-of-Scope Use |
|
|
|
🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only. |
|
|
|
## Bias, Risks, and Limitations |
|
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics. |
|
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice. |
|
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions. |
|
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm. |
|
|
|
### Recommendations |
|
|
|
- Pre-training on larger, diverse datasets: To improve generalization across different patient populations. |
|
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions. |
|
- Continuous evaluation: With real-world clinical data before integration into healthcare systems |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- Dataset: Breast Histopathology Images |
|
- Image Types: High-resolution mammograms |
|
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal) |
|
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
- **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout) |
|
- **Loss Function**: Categorical Cross-Entropy |
|
- **Optimizer**: Adam |
|
- **Validation Split**: 20% |
|
- **Callbacks**: Early Stopping, ReduceLROnPlateau |
|
|
|
#### Preprocessing |
|
- Grayscale conversion for reduced complexity |
|
- Contrast enhancement for better lesion visibility |
|
- Image resizing to (50, 50) pixels |
|
- Normalization (scaling pixel values between 0 and 1) |
|
- Data augmentation (flipping, rotation, zooming) to improve generalization |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Epochs:** 20 |
|
- **Batch Size:** 75 |
|
- **Learning Rate:** 0.001 |
|
- **Optimizer:** Adam |
|
- **Dropout Rate:** 0.4 |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
- Total Training Time: 33m |
|
- Hardware Used: Tesla P100 |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
The model was evaluated on the test split of the CBIS-DDSM dataset |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
The following metrics were computed for evaluation: |
|
- Accuracy |
|
- Confusion Matrix |
|
|
|
### Results |
|
|
|
- Accuracy: 0.9789 |
|
|
|
#### Summary |
|
|
|
The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny. |
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** Tesla P100 |
|
- **Hours used:** 0.33 |
|
- **Cloud Provider:** Kaggle |
|
- **Carbon Emitted:** 0.04 |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
If you use this model, please cite it as follows: |
|
``` |
|
@misc{CBIS-DDSM-CNN, |
|
author = {Lorenzo Maiuri}, |
|
title = {CBIS-DDSM-CNN}, |
|
year = {2025}, |
|
publisher = {Hugging Face Hub}, |
|
license = {MIT} |
|
} |
|
``` |