๐Ÿ“ข [GitHub Repo] [Research Paper (coming soon)]

Serum-MiR-CanPred: An Artificial Intelligence-Driven Framework for Pan-Cancer Prediction Using a Minimal Set of Circulating miRNA Biomarkers

Summary

This study presents Serum-MiR-CanPred, a machine learning framework that leverages serum microRNA (miRNA) expression data to non-invasively diagnose 13 different cancer types. Using a multilayer perceptron (MLP) model and SHAP for interpretability, the method achieved high accuracy (AUC 99.87%) and identified key discriminatory miRNAs, including hsa-miR-5100. Literature validation and molecular docking revealed that AC1MMYR2, a compound targeting the Dicer site, binds stably to pre-miR-5100, suggesting therapeutic potential. This integrative approach demonstrates the dual utility of circulating miRNAs as diagnostic biomarkers and therapeutic targets, offering a promising direction for AI-driven, non-invasive cancer diagnostics and drug discovery.

Dataset

  • Source: [GEO Database] (https://www.ncbi.nlm.nih.gov/geo/)
  • GEO Accession: GSE212211, GSE113740, GSE211692, GSE164174
  • Preprocessing: To determine the presence of miRNA, sample signals were compared against a threshold established from blank signals. After removing the top and bottom 5% of blank signal intensities, and the threshold was calculated as the mean plus two standard deviations of the remaining values. For detected miRNA, the mean of the filtered blank signals was subtracted from the sample intensities. Signals that were not detected were assigned a value of 0.1 on a log2 scale. Data normalization was performed using internal control miRNAs (hsa-miR-149-3p, hsa-miR-2861, hsa-miR-4463). Batch effects were adjusted using PyComBat, and the dataset was labelled and combined for further analysis.

Training Details

  • Number of Layers: 3
  • Units: 512
  • Batch Size: 256
  • Epochs: 100
  • Dropout Rate: 0.4
  • Learning Rate: 0.00032506541805349084
  • Hardware: NVIDIA GeForce RTX 3050

Performance

The model was validated on 4055 samples (20% of the dataset).

Class Precision Recall F1-Score Support
BL 0.89 0.85 0.87 85
BR 0.94 0.97 0.96 140
BT 0.80 0.84 0.82 94
CR 0.91 0.87 0.89 334
ES 0.89 0.89 0.89 128
GA 0.97 0.99 0.98 572
HC 0.95 0.95 0.95 215
LU 0.93 0.94 0.94 345
NC 0.99 0.99 0.99 1606
OV 0.84 0.87 0.86 85
PA 0.98 0.91 0.94 175
PR 0.94 0.98 0.96 211
SA 0.88 0.91 0.89 65
macro avg 0.92 0.92 0.92 4055
weighted avg 0.96 0.96 0.96 4055

Overall Accuracy = 96%

Citation

If using this model, please cite:

License

MIT License

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support