PanDrugTransformer Model Card

Model Overview

PanDrugTransformer is a sequence-to-value regression model designed to predict readthrough from nucleotide sequences and drug context.

Architecture: Custom transformer with cross-attention between nucleotide sequence and drug embedding, plus a regression head.
Base Model: InstaDeepAI/nucleotide-transformer-v2-500m-multi-species
Purpose: Predict readthrough rates for given nucleotide sequences and drug conditions.

Training Procedure

Hyperparameter Optimization: Optuna was used to tune model parameters.
Final Training: Best hyperparameters were selected for full training on processed splits.
Evaluation Metrics: R² (coefficient of determination) on validation/test sets.

Data

Splits: Model trained and evaluated on processed train/validation/test splits.
Features: Each sample includes a nucleotide sequence and a drug name column (embedded for cross-attention).

Usage Instructions

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("Dichopsis/TransStop")
tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/nucleotide-transformer-v2-500m-multi-species")

# Example input
sequence = "CGTTGGTAGCCAATT" # (6nt-STOP-6nt)
drug_name = "Clitocine"  # Format as required by model

inputs = tokenizer(sequence, return_tensors="pt")
# Add drug name embedding as required by model's API
outputs = model(**inputs, drug_name=drug_name)
prediction = outputs.logits.item()  # Regression output

Notes for Hugging Face Users

Drug Embedding: Drug name is embedded and integrated via cross-attention.
Regression Head: Model outputs a continuous value.
Compatibility: Requires a 15nt nucleotide sequence (6nt-STOP-6nt) and drug name input.
Evaluation: R² reported for validation/test splits.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

498M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dichopsis/TransStop

Base model

InstaDeepAI/nucleotide-transformer-v2-500m-multi-species

Finetuned

(14)

this model