File size: 7,879 Bytes

---
license: mit
metrics:
- accuracy
- confusion_matrix
- precision
- recall
pipeline_tag: image-classification
library_name: keras
tags:
- medical
---
# CBIS-DDSM-CNN

CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.

The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.

## Model Details

### Model Description

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Lorenzo Maiuri
- **Funded by:** No funds
- **Shared by:** Lorenzo Maiuri
- **Model type:** Image Classification
- **License:** MIT

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN)
- **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset)
- **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images)
- **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn)
- **Demo:** Coming soon...
<!-- - **Demo:** [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) -->

## Uses

### Try It Out

Coming soon...
<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space).  
Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score -->

### Direct Use

```python
from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt

# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)

# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
    preprocessing_info = json.load(f)

# Define preprocessing function
def load_and_preprocess_image(image_path):
    try:
        img = cv2.imread(image_path, cv2.IMREAD_COLOR)
        if img is None:
            raise ValueError(f"Could not read image: {image_path}")
        
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
        img_array = img.astype(np.float32) / 255.0  

        return img_array
    except Exception as e:
        print(f"Error processing {image_path}: {str(e)}")
        return None

# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)

if img_array is not None:
    img_batch = np.expand_dims(img_array, axis=0)
    predictions = model.predict(img_batch)

    cancer_probability = predictions[0][0]  # Assuming "Cancer" is the first class
    predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"

    plt.imshow(img_array)
    plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
    plt.axis('off')
    plt.show()
else:
    print("Image loading and preprocessing failed.")
```

### Downstream Use
- Medical Research: Can be used to assist in studying breast cancer detection techniques.
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.

### Out-of-Scope Use

🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.

## Bias, Risks, and Limitations
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.
  
### Recommendations

- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
- Continuous evaluation: With real-world clinical data before integration into healthcare systems

## Training Details

### Training Data
- Dataset: Breast Histopathology Images
- Image Types: High-resolution mammograms
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

- **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout)
- **Loss Function**: Categorical Cross-Entropy
- **Optimizer**: Adam
- **Validation Split**: 20%
- **Callbacks**: Early Stopping, ReduceLROnPlateau

#### Preprocessing
- Grayscale conversion for reduced complexity
- Contrast enhancement for better lesion visibility
- Image resizing to (50, 50) pixels
- Normalization (scaling pixel values between 0 and 1)
- Data augmentation (flipping, rotation, zooming) to improve generalization

#### Training Hyperparameters

- **Epochs:** 20
- **Batch Size:** 75
- **Learning Rate:** 0.001
- **Optimizer:** Adam
- **Dropout Rate:** 0.4

#### Speeds, Sizes, Times

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

- Total Training Time: 33m
- Hardware Used: Tesla P100

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

The model was evaluated on the test split of the CBIS-DDSM dataset

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

The following metrics were computed for evaluation:
- Accuracy
- Confusion Matrix

### Results

- Accuracy: 0.9789

#### Summary

The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** Tesla P100
- **Hours used:** 0.33
- **Cloud Provider:** Kaggle
- **Carbon Emitted:** 0.04

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

If you use this model, please cite it as follows:
```
@misc{CBIS-DDSM-CNN,
  author = {Lorenzo Maiuri},
  title = {CBIS-DDSM-CNN},
  year = {2025},
  publisher = {Hugging Face Hub},
  license = {MIT}
}
```