File size: 7,879 Bytes
fcbce65
 
 
 
 
 
 
 
 
 
 
 
46e6076
 
fcbce65
46e6076
 
 
990332a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46e6076
990332a
 
 
 
 
 
 
 
 
46e6076
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
990332a
 
 
 
 
46e6076
990332a
46e6076
990332a
 
 
e11d1c6
990332a
 
 
 
 
 
 
 
 
46e6076
 
 
990332a
e11d1c6
990332a
 
 
 
 
 
 
 
 
 
 
 
 
46e6076
990332a
 
 
 
 
 
46e6076
990332a
46e6076
e11d1c6
990332a
 
 
 
 
 
 
 
 
e11d1c6
990332a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46e6076
990332a
 
 
 
e11d1c6
990332a
 
 
 
 
 
46e6076
990332a
46e6076
990332a
 
 
e11d1c6
990332a
e11d1c6
990332a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
license: mit
metrics:
- accuracy
- confusion_matrix
- precision
- recall
pipeline_tag: image-classification
library_name: keras
tags:
- medical
---
# CBIS-DDSM-CNN

CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.

The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.

## Model Details

### Model Description

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Lorenzo Maiuri
- **Funded by:** No funds
- **Shared by:** Lorenzo Maiuri
- **Model type:** Image Classification
- **License:** MIT

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN)
- **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset)
- **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images)
- **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn)
- **Demo:** Coming soon...
<!-- - **Demo:** [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) -->

## Uses

### Try It Out

Coming soon...
<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space).  
Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score -->

### Direct Use

```python
from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt

# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)

# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
    preprocessing_info = json.load(f)

# Define preprocessing function
def load_and_preprocess_image(image_path):
    try:
        img = cv2.imread(image_path, cv2.IMREAD_COLOR)
        if img is None:
            raise ValueError(f"Could not read image: {image_path}")
        
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
        img_array = img.astype(np.float32) / 255.0  

        return img_array
    except Exception as e:
        print(f"Error processing {image_path}: {str(e)}")
        return None

# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)

if img_array is not None:
    img_batch = np.expand_dims(img_array, axis=0)
    predictions = model.predict(img_batch)

    cancer_probability = predictions[0][0]  # Assuming "Cancer" is the first class
    predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"

    plt.imshow(img_array)
    plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
    plt.axis('off')
    plt.show()
else:
    print("Image loading and preprocessing failed.")
```

### Downstream Use
- Medical Research: Can be used to assist in studying breast cancer detection techniques.
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.

### Out-of-Scope Use

🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.

## Bias, Risks, and Limitations
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.
  
### Recommendations

- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
- Continuous evaluation: With real-world clinical data before integration into healthcare systems

## Training Details

### Training Data
- Dataset: Breast Histopathology Images
- Image Types: High-resolution mammograms
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

- **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout)
- **Loss Function**: Categorical Cross-Entropy
- **Optimizer**: Adam
- **Validation Split**: 20%
- **Callbacks**: Early Stopping, ReduceLROnPlateau

#### Preprocessing
- Grayscale conversion for reduced complexity
- Contrast enhancement for better lesion visibility
- Image resizing to (50, 50) pixels
- Normalization (scaling pixel values between 0 and 1)
- Data augmentation (flipping, rotation, zooming) to improve generalization

#### Training Hyperparameters

- **Epochs:** 20
- **Batch Size:** 75
- **Learning Rate:** 0.001
- **Optimizer:** Adam
- **Dropout Rate:** 0.4

#### Speeds, Sizes, Times

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

- Total Training Time: 33m
- Hardware Used: Tesla P100

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

The model was evaluated on the test split of the CBIS-DDSM dataset

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

The following metrics were computed for evaluation:
- Accuracy
- Confusion Matrix

### Results

- Accuracy: 0.9789

#### Summary

The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** Tesla P100
- **Hours used:** 0.33
- **Cloud Provider:** Kaggle
- **Carbon Emitted:** 0.04

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

If you use this model, please cite it as follows:
```
@misc{CBIS-DDSM-CNN,
  author = {Lorenzo Maiuri},
  title = {CBIS-DDSM-CNN},
  year = {2025},
  publisher = {Hugging Face Hub},
  license = {MIT}
}
```