CBIS-DDSM-CNN / README.md

Update README.md

e11d1c6 verified 5 months ago

7.88 kB

	---
	license: mit
	metrics:
	- accuracy
	- confusion_matrix
	- precision
	- recall
	pipeline_tag: image-classification
	library_name: keras
	tags:
	- medical
	---
	# CBIS-DDSM-CNN

	CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.

	The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.

	## Model Details

	### Model Description

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Lorenzo Maiuri
	- Funded by: No funds
	- Shared by: Lorenzo Maiuri
	- Model type: Image Classification
	- License: MIT

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [Hugging Face Model Repository](https://huggingface.co/maiurilorenzo/CBIS-DDSM-CNN)
	- Dataset: [CBIS-DDSM (Curated Breast Imaging Subset DDSM)](https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset)
	- Dataset: [Breast Histopathology Images](https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images)
	- Kaggle Notebook: [Link to Kaggle Notebook](https://www.kaggle.com/code/lorenzomaiuri/cbis-ddsm-cancer-detection-cnn)
	- Demo: Coming soon...
	<!-- - Demo: [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space) -->

	## Uses

	### Try It Out

	Coming soon...
	<!-- You can try this model interactively using the [Misogyny Detection IT Space](https://huggingface.co/spaces/maiurilorenzo/misogyny-detection-it-space).
	Simply enter a text prompt, and the model will classify it as 'Misogynistic' or 'Non-Misogynistic' along with a confidence score -->

	### Direct Use

	```python
	from huggingface_hub import hf_hub_download
	import tensorflow as tf
	import cv2
	import numpy as np
	import json
	import matplotlib.pyplot as plt

	# Load model
	repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
	model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
	model = tf.keras.models.load_model(model_path)

	# Load preprocessing info
	preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
	with open(preprocessing_path, "r") as f:
	preprocessing_info = json.load(f)

	# Define preprocessing function
	def load_and_preprocess_image(image_path):
	try:
	img = cv2.imread(image_path, cv2.IMREAD_COLOR)
	if img is None:
	raise ValueError(f"Could not read image: {image_path}")

	img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
	img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
	img_array = img.astype(np.float32) / 255.0

	return img_array
	except Exception as e:
	print(f"Error processing {image_path}: {str(e)}")
	return None

	# Load and preprocess an example image
	image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
	img_array = load_and_preprocess_image(image_path)

	if img_array is not None:
	img_batch = np.expand_dims(img_array, axis=0)
	predictions = model.predict(img_batch)

	cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class
	predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"

	plt.imshow(img_array)
	plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
	plt.axis('off')
	plt.show()
	else:
	print("Image loading and preprocessing failed.")
	```

	### Downstream Use
	- Medical Research: Can be used to assist in studying breast cancer detection techniques.
	- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
	- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
	- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.

	### Out-of-Scope Use

	🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.

	## Bias, Risks, and Limitations
	- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
	- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
	- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
	- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.

	### Recommendations

	- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
	- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
	- Continuous evaluation: With real-world clinical data before integration into healthcare systems

	## Training Details

	### Training Data
	- Dataset: Breast Histopathology Images
	- Image Types: High-resolution mammograms
	- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
	- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	- Model Architecture: CNN (4 Convolutional layers + BatchNorm + Dropout)
	- Loss Function: Categorical Cross-Entropy
	- Optimizer: Adam
	- Validation Split: 20%
	- Callbacks: Early Stopping, ReduceLROnPlateau

	#### Preprocessing
	- Grayscale conversion for reduced complexity
	- Contrast enhancement for better lesion visibility
	- Image resizing to (50, 50) pixels
	- Normalization (scaling pixel values between 0 and 1)
	- Data augmentation (flipping, rotation, zooming) to improve generalization

	#### Training Hyperparameters

	- Epochs: 20
	- Batch Size: 75
	- Learning Rate: 0.001
	- Optimizer: Adam
	- Dropout Rate: 0.4

	#### Speeds, Sizes, Times

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	- Total Training Time: 33m
	- Hardware Used: Tesla P100

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	The model was evaluated on the test split of the CBIS-DDSM dataset

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	The following metrics were computed for evaluation:
	- Accuracy
	- Confusion Matrix

	### Results

	- Accuracy: 0.9789

	#### Summary

	The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: Tesla P100
	- Hours used: 0.33
	- Cloud Provider: Kaggle
	- Carbon Emitted: 0.04

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	If you use this model, please cite it as follows:
	```
	@misc{CBIS-DDSM-CNN,
	author = {Lorenzo Maiuri},
	title = {CBIS-DDSM-CNN},
	year = {2025},
	publisher = {Hugging Face Hub},
	license = {MIT}
	}
	```