📢 [GitHub Repo] [Research Paper (coming soon)]

Serum-MiR-CanPred: An Artificial Intelligence-Driven Framework for Pan-Cancer Prediction Using a Minimal Set of Circulating miRNA Biomarkers

Summary

This study presents Serum-MiR-CanPred, a machine learning framework that leverages serum microRNA (miRNA) expression data to non-invasively diagnose 13 different cancer types. Using a multilayer perceptron (MLP) model and SHAP for interpretability, the method achieved high accuracy (AUC 99.87%) and identified key discriminatory miRNAs, including hsa-miR-5100. Literature validation and molecular docking revealed that AC1MMYR2, a compound targeting the Dicer site, binds stably to pre-miR-5100, suggesting therapeutic potential. This integrative approach demonstrates the dual utility of circulating miRNAs as diagnostic biomarkers and therapeutic targets, offering a promising direction for AI-driven, non-invasive cancer diagnostics and drug discovery.

Dataset

Source: [GEO Database] (https://www.ncbi.nlm.nih.gov/geo/)
GEO Accession: GSE212211, GSE113740, GSE211692, GSE164174
Preprocessing: To determine the presence of miRNA, sample signals were compared against a threshold established from blank signals. After removing the top and bottom 5% of blank signal intensities, and the threshold was calculated as the mean plus two standard deviations of the remaining values. For detected miRNA, the mean of the filtered blank signals was subtracted from the sample intensities. Signals that were not detected were assigned a value of 0.1 on a log2 scale. Data normalization was performed using internal control miRNAs (hsa-miR-149-3p, hsa-miR-2861, hsa-miR-4463). Batch effects were adjusted using PyComBat, and the dataset was labelled and combined for further analysis.

Training Details

Number of Layers: 3
Units: 512
Batch Size: 256
Epochs: 100
Dropout Rate: 0.4
Learning Rate: 0.00032506541805349084
Hardware: NVIDIA GeForce RTX 3050

Performance

The model was validated on 4055 samples (20% of the dataset).

Class	Precision	Recall	F1-Score	Support
BL	0.89	0.85	0.87	85
BR	0.94	0.97	0.96	140
BT	0.80	0.84	0.82	94
CR	0.91	0.87	0.89	334
ES	0.89	0.89	0.89	128
GA	0.97	0.99	0.98	572
HC	0.95	0.95	0.95	215
LU	0.93	0.94	0.94	345
NC	0.99	0.99	0.99	1606
OV	0.84	0.87	0.86	85
PA	0.98	0.91	0.94	175
PR	0.94	0.98	0.96	211
SA	0.88	0.91	0.89	65

macro avg	0.92	0.92	0.92	4055
weighted avg	0.96	0.96	0.96	4055

naisarg14
/

MLP-PanCanPred

Serum-MiR-CanPred: An Artificial Intelligence-Driven Framework for Pan-Cancer Prediction Using a Minimal Set of Circulating miRNA Biomarkers

Summary

Dataset

Training Details

Performance

Citation

License