๐ข [GitHub Repo] [Research Paper (coming soon)]
Serum-MiR-CanPred: An Artificial Intelligence-Driven Framework for Pan-Cancer Prediction Using a Minimal Set of Circulating miRNA Biomarkers
Summary
This study presents Serum-MiR-CanPred, a machine learning framework that leverages serum microRNA (miRNA) expression data to non-invasively diagnose 13 different cancer types. Using a multilayer perceptron (MLP) model and SHAP for interpretability, the method achieved high accuracy (AUC 99.87%) and identified key discriminatory miRNAs, including hsa-miR-5100. Literature validation and molecular docking revealed that AC1MMYR2, a compound targeting the Dicer site, binds stably to pre-miR-5100, suggesting therapeutic potential. This integrative approach demonstrates the dual utility of circulating miRNAs as diagnostic biomarkers and therapeutic targets, offering a promising direction for AI-driven, non-invasive cancer diagnostics and drug discovery.
Dataset
- Source: [GEO Database] (https://www.ncbi.nlm.nih.gov/geo/)
- GEO Accession: GSE212211, GSE113740, GSE211692, GSE164174
- Preprocessing: To determine the presence of miRNA, sample signals were compared against a threshold established from blank signals. After removing the top and bottom 5% of blank signal intensities, and the threshold was calculated as the mean plus two standard deviations of the remaining values. For detected miRNA, the mean of the filtered blank signals was subtracted from the sample intensities. Signals that were not detected were assigned a value of 0.1 on a log2 scale. Data normalization was performed using internal control miRNAs (hsa-miR-149-3p, hsa-miR-2861, hsa-miR-4463). Batch effects were adjusted using PyComBat, and the dataset was labelled and combined for further analysis.
Training Details
- Number of Layers: 3
- Units: 512
- Batch Size: 256
- Epochs: 100
- Dropout Rate: 0.4
- Learning Rate: 0.00032506541805349084
- Hardware: NVIDIA GeForce RTX 3050
Performance
The model was validated on 4055 samples (20% of the dataset).
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
BL | 0.89 | 0.85 | 0.87 | 85 |
BR | 0.94 | 0.97 | 0.96 | 140 |
BT | 0.80 | 0.84 | 0.82 | 94 |
CR | 0.91 | 0.87 | 0.89 | 334 |
ES | 0.89 | 0.89 | 0.89 | 128 |
GA | 0.97 | 0.99 | 0.98 | 572 |
HC | 0.95 | 0.95 | 0.95 | 215 |
LU | 0.93 | 0.94 | 0.94 | 345 |
NC | 0.99 | 0.99 | 0.99 | 1606 |
OV | 0.84 | 0.87 | 0.86 | 85 |
PA | 0.98 | 0.91 | 0.94 | 175 |
PR | 0.94 | 0.98 | 0.96 | 211 |
SA | 0.88 | 0.91 | 0.89 | 65 |
macro avg | 0.92 | 0.92 | 0.92 | 4055 |
weighted avg | 0.96 | 0.96 | 0.96 | 4055 |
Overall Accuracy = 96%
Citation
If using this model, please cite:
License
MIT License
- Downloads last month
- 3