Multi-output DNA Structure Regressor (PyTorch)
Description
This model is a multi-output DNA structure regressor built and trained from scratch in PyTorch.
It predicts six structural stability metrics — including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops — directly from engineered DNA sequence features.
Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like NUPACK, enabling near-instant predictions for plasmid stability analysis.
Model
- Architecture: 3-layer MLP (512→256→128, dropout 0.3)
- Inputs: 109658 features
- Outputs: 6 targets → mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops
- Loss: MSE
- Optimizer: Adam (lr=0.0001)
- Epochs: 15
Metrics (test)
- Overall MSE:
15022.6787 - Overall R²:
-34.0313 - Training time (s):
131.85 - Prediction time (s):
0.2694
MAE per target
{
"mfe_energy": 139.4054718017578,
"num_pairs": 116.53337097167969,
"stem_len_mean": 2.4054114818573,
"num_stems": 69.17422485351562,
"num_hairpins": 14.115099906921387,
"num_internal_loops": 94.97564697265625
}
Usage
pip install torch numpy
python inference.py
Ensure to apply any preprocessing (e.g., scaling, SVD) used during training.
Limitations
- Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences.
- The model is intended for educational and exploratory research use, not for experimental or clinical validation.