Update README.md
Browse files
README.md
CHANGED
|
@@ -4,4 +4,50 @@ metrics:
|
|
| 4 |
- r_squared
|
| 5 |
base_model:
|
| 6 |
- InstaDeepAI/nucleotide-transformer-v2-500m-multi-species
|
| 7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
- r_squared
|
| 5 |
base_model:
|
| 6 |
- InstaDeepAI/nucleotide-transformer-v2-500m-multi-species
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# PanDrugTransformer Model Card
|
| 10 |
+
|
| 11 |
+
## Model Overview
|
| 12 |
+
|
| 13 |
+
**PanDrugTransformer** is a sequence-to-value regression model designed to predict `RT_transformed` from nucleotide sequences and drug context.
|
| 14 |
+
|
| 15 |
+
- **Architecture:** Custom transformer with cross-attention between nucleotide sequence and drug embedding, plus a regression head.
|
| 16 |
+
- **Base Model:** [`InstaDeepAI/nucleotide-transformer-v2-500m-multi-species`](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species)
|
| 17 |
+
- **Purpose:** Predict readthrough rates for given nucleotide sequences and drug conditions.
|
| 18 |
+
|
| 19 |
+
## Training Procedure
|
| 20 |
+
|
| 21 |
+
- **Hyperparameter Optimization:** Optuna was used to tune model parameters.
|
| 22 |
+
- **Final Training:** Best hyperparameters were selected for full training on processed splits.
|
| 23 |
+
- **Evaluation Metrics:** R² (coefficient of determination) on validation/test sets.
|
| 24 |
+
|
| 25 |
+
## Data
|
| 26 |
+
|
| 27 |
+
- **Splits:** Model trained and evaluated on processed train/validation/test splits.
|
| 28 |
+
- **Features:** Each sample includes a nucleotide sequence and a drug name column (embedded for cross-attention).
|
| 29 |
+
|
| 30 |
+
## Usage Instructions
|
| 31 |
+
|
| 32 |
+
```python
|
| 33 |
+
from transformers import AutoModel, AutoTokenizer
|
| 34 |
+
|
| 35 |
+
model = AutoModel.from_pretrained("Dichopsis/TransStop")
|
| 36 |
+
tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/nucleotide-transformer-v2-500m-multi-species")
|
| 37 |
+
|
| 38 |
+
# Example input
|
| 39 |
+
sequence = "CGTTGGTAGCCAATT" # (6nt-STOP-6nt)
|
| 40 |
+
drug_name = "Clitocine" # Format as required by model
|
| 41 |
+
|
| 42 |
+
inputs = tokenizer(sequence, return_tensors="pt")
|
| 43 |
+
# Add drug name embedding as required by model's API
|
| 44 |
+
outputs = model(**inputs, drug_name=drug_name)
|
| 45 |
+
prediction = outputs.logits.item() # Regression output
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Notes for Hugging Face Users
|
| 49 |
+
|
| 50 |
+
- **Drug Embedding:** Drug name is embedded and integrated via cross-attention.
|
| 51 |
+
- **Regression Head:** Model outputs a continuous value.
|
| 52 |
+
- **Compatibility:** Requires a 15nt nucleotide sequence (6nt-STOP-6nt) and drug name input.
|
| 53 |
+
- **Evaluation:** R² reported for validation/test splits.
|