File size: 1,740 Bytes
60fd15e 53b6c79 60fd15e 53b6c79 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# Multi-output DNA Structure Regressor (PyTorch)
## Description
This model is a **multi-output DNA structure regressor** built and trained from scratch in **PyTorch**.
It predicts six structural stability metrics — including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops — directly from engineered DNA sequence features.
Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like **NUPACK**, enabling near-instant predictions for plasmid stability analysis.
## Model
- **Architecture:** 3-layer MLP (512→256→128, dropout 0.3)
- **Inputs:** 109658 features
- **Outputs:** 6 targets → mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops
- **Loss:** MSE
- **Optimizer:** Adam (lr=0.0001)
- **Epochs:** 15
## Metrics (test)
- Overall MSE: `15022.6787`
- Overall R²: `-34.0313`
- Training time (s): `131.85`
- Prediction time (s): `0.2694`
### MAE per target
```json
{
"mfe_energy": 139.4054718017578,
"num_pairs": 116.53337097167969,
"stem_len_mean": 2.4054114818573,
"num_stems": 69.17422485351562,
"num_hairpins": 14.115099906921387,
"num_internal_loops": 94.97564697265625
}
```
## Usage
```bash
pip install torch numpy
python inference.py
```
Ensure to apply any preprocessing (e.g., scaling, SVD) used during training.
## Limitations
- Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences.
- The model is intended for **educational and exploratory research use**, not for experimental or clinical validation. |