--- license: mit metrics: - accuracy - confusion_matrix - precision - recall pipeline_tag: image-classification library_name: keras tags: - medical --- # CBIS-DDSM-CNN CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research. The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis. ## Model Details ### Model Description ### Model Description - **Developed by:** Lorenzo Maiuri - **Funded by:** No funds - **Shared by:** Lorenzo Maiuri - **Model type:** Image Classification - **License:** MIT ### Model Sources - **Repository:** [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN) - **Dataset:** [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset) - **Dataset:** [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images) - **Kaggle Notebook:** [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn) - **Demo:** Coming soon... ## Uses ### Try It Out Coming soon... ### Direct Use ```python from huggingface_hub import hf_hub_download import tensorflow as tf import cv2 import numpy as np import json import matplotlib.pyplot as plt # Load model repo_id = "maiurilorenzo/CBIS-DDSM-CNN" model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5") model = tf.keras.models.load_model(model_path) # Load preprocessing info preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json") with open(preprocessing_path, "r") as f: preprocessing_info = json.load(f) # Define preprocessing function def load_and_preprocess_image(image_path): try: img = cv2.imread(image_path, cv2.IMREAD_COLOR) if img is None: raise ValueError(f"Could not read image: {image_path}") img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA) img_array = img.astype(np.float32) / 255.0 return img_array except Exception as e: print(f"Error processing {image_path}: {str(e)}") return None # Load and preprocess an example image image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg" img_array = load_and_preprocess_image(image_path) if img_array is not None: img_batch = np.expand_dims(img_array, axis=0) predictions = model.predict(img_batch) cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal" plt.imshow(img_array) plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}') plt.axis('off') plt.show() else: print("Image loading and preprocessing failed.") ``` ### Downstream Use - Medical Research: Can be used to assist in studying breast cancer detection techniques. - Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use). - Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging - Educational Purposes: Suitable for learning about deep learning applications in medical imaging. ### Out-of-Scope Use 🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only. ## Bias, Risks, and Limitations - Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics. - False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice. - Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions. - Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm. ### Recommendations - Pre-training on larger, diverse datasets: To improve generalization across different patient populations. - Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions. - Continuous evaluation: With real-world clinical data before integration into healthcare systems ## Training Details ### Training Data - Dataset: Breast Histopathology Images - Image Types: High-resolution mammograms - Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal) - Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments ### Training Procedure - **Model Architecture**: CNN (4 Convolutional layers + BatchNorm + Dropout) - **Loss Function**: Categorical Cross-Entropy - **Optimizer**: Adam - **Validation Split**: 20% - **Callbacks**: Early Stopping, ReduceLROnPlateau #### Preprocessing - Grayscale conversion for reduced complexity - Contrast enhancement for better lesion visibility - Image resizing to (50, 50) pixels - Normalization (scaling pixel values between 0 and 1) - Data augmentation (flipping, rotation, zooming) to improve generalization #### Training Hyperparameters - **Epochs:** 20 - **Batch Size:** 75 - **Learning Rate:** 0.001 - **Optimizer:** Adam - **Dropout Rate:** 0.4 #### Speeds, Sizes, Times - Total Training Time: 33m - Hardware Used: Tesla P100 ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on the test split of the CBIS-DDSM dataset #### Metrics The following metrics were computed for evaluation: - Accuracy - Confusion Matrix ### Results - Accuracy: 0.9789 #### Summary The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny. ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** Tesla P100 - **Hours used:** 0.33 - **Cloud Provider:** Kaggle - **Carbon Emitted:** 0.04 ## Citation If you use this model, please cite it as follows: ``` @misc{CBIS-DDSM-CNN, author = {Lorenzo Maiuri}, title = {CBIS-DDSM-CNN}, year = {2025}, publisher = {Hugging Face Hub}, license = {MIT} } ```