YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Multi-output DNA Structure Regressor (PyTorch)

Description

This model is a multi-output DNA structure regressor built and trained from scratch in PyTorch.
It predicts six structural stability metrics โ€” including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops โ€” directly from engineered DNA sequence features.
Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like NUPACK, enabling near-instant predictions for plasmid stability analysis.

Model

  • Architecture: 3-layer MLP (512โ†’256โ†’128, dropout 0.3)
  • Inputs: 109658 features
  • Outputs: 6 targets โ†’ mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops
  • Loss: MSE
  • Optimizer: Adam (lr=0.0001)
  • Epochs: 15

Metrics (test)

  • Overall MSE: 15022.6787
  • Overall Rยฒ: -34.0313
  • Training time (s): 131.85
  • Prediction time (s): 0.2694

MAE per target

{
  "mfe_energy": 139.4054718017578,
  "num_pairs": 116.53337097167969,
  "stem_len_mean": 2.4054114818573,
  "num_stems": 69.17422485351562,
  "num_hairpins": 14.115099906921387,
  "num_internal_loops": 94.97564697265625
}

Usage

pip install torch numpy
python inference.py

Ensure to apply any preprocessing (e.g., scaling, SVD) used during training.

Limitations

  • Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences.
  • The model is intended for educational and exploratory research use, not for experimental or clinical validation.
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support