File size: 1,740 Bytes
60fd15e
 
53b6c79
 
 
 
60fd15e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53b6c79
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Multi-output DNA Structure Regressor (PyTorch)

## Description
This model is a **multi-output DNA structure regressor** built and trained from scratch in **PyTorch**.  
It predicts six structural stability metrics — including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops — directly from engineered DNA sequence features.  
Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like **NUPACK**, enabling near-instant predictions for plasmid stability analysis.

## Model
- **Architecture:** 3-layer MLP (512→256→128, dropout 0.3)
- **Inputs:** 109658 features
- **Outputs:** 6 targets → mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops
- **Loss:** MSE
- **Optimizer:** Adam (lr=0.0001)
- **Epochs:** 15

## Metrics (test)
- Overall MSE: `15022.6787`
- Overall R²: `-34.0313`
- Training time (s): `131.85`
- Prediction time (s): `0.2694`

### MAE per target
```json
{
  "mfe_energy": 139.4054718017578,
  "num_pairs": 116.53337097167969,
  "stem_len_mean": 2.4054114818573,
  "num_stems": 69.17422485351562,
  "num_hairpins": 14.115099906921387,
  "num_internal_loops": 94.97564697265625
}
```

## Usage
```bash
pip install torch numpy
python inference.py
```

Ensure to apply any preprocessing (e.g., scaling, SVD) used during training.

## Limitations
- Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences.  
- The model is intended for **educational and exploratory research use**, not for experimental or clinical validation.