π§ NTxPred2: A large language model for predicting neurotoxic peptides and neurotoxins
NTxPred2 is a fine-tuned transformer model built on top of the ESM2-t30_150M_UR50D protein language model. It is specifically trained for binary classification of peptide sequences β predicting whether a peptide is neurotoxic or non-toxic.
π― Use Case: Accelerating the identification and design of safe peptide therapeutics by filtering out neurotoxic candidates early in the drug development pipeline.
πΌοΈ NTxPred2 Workflow
𧬠Model Highlights
- Base Model: Facebookβs ESM2-t30 (150M parameters)
- Fine-Tuning Task: Neurotoxicity prediction (binary classification)
- Input: Short peptide sequences (7β50 amino acids)
- Output: Binary label β
1
(neurotoxic),0
(non-toxic) - Architecture: ESM2 encoder + linear classification head
ποΈ Files Included
config.json
β Contains configuration settings for the model architecture, hyperparameters, and training details.model.safetensors
β This is the actual trained model weights saved in the SafeTensors format, which is safer and faster than the traditional .bin files.special_tokens_map.json
β Stores mappings for special tokens, like [CLS], [SEP], or any custom tokens used in your tokenizer.tokenizer_config.json
β Contains tokenizer-related settings (like vocabulary size, tokenization method).vocab.txt
β Lists all tokens and their corresponding IDs; it's essential for text tokenization.
π How to Use
π§ Install Dependencies
pip install torch esm biopython huggingface_hub
### Loading the Model from Hugging Face
```python
import torch
import torch.nn as nn
import esm
import json
from huggingface_hub import hf_hub_download
# Define the classifier model (ESM encoder + linear head)
class ProteinClassifier(nn.Module):
def __init__(self, esm_model, embedding_dim, num_classes):
super(ProteinClassifier, self).__init__()
self.esm_model = esm_model
self.fc = nn.Linear(embedding_dim, num_classes)
def forward(self, tokens):
layer_index = len(self.esm_model.layers) # Get number of layers
results = self.esm_model(tokens, repr_layers=[layer_index])
embeddings = results["representations"][layer_index].mean(1)
return self.fc(embeddings)
# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load config from your repo
config_path = hf_hub_download(repo_id="anandr88/NTxPred2", filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
# Load ESM2 model - UPDATED METHOD
model_name = "esm2_t30_150M_UR50D"
esm_model, alphabet = esm.pretrained.load_model_and_alphabet(model_name)
batch_converter = alphabet.get_batch_converter()
# Initialize a NEW classifier (with random weights)
classifier = ProteinClassifier(
esm_model,
embedding_dim=config['embedding_dim'],
num_classes=config['num_classes']
)
classifier.to(device)
classifier.eval()
print("β
Model loaded successfully!")
print(f"Using device: {device}")
print(f"Model architecture: {classifier}")
π§ͺ Example Usage (Optional)
# Example Usage for Binary Classification
sequence = ("TEST_SEQUENCE", "ACDEFGHIKLMNPQRSTVWY") # Your peptide sequence
# Convert to model input format
_, _, batch_tokens = batch_converter([sequence])
batch_tokens = batch_tokens.to(device)
# Predict
with torch.no_grad():
logits = classifier(batch_tokens)
probability = torch.sigmoid(logits).item() # Sigmoid for binary classification
# Interpret results
threshold = 0.5 # Standard threshold (adjust if needed)
prediction = "Neurotoxic" if probability >= threshold else "Not-toxic"
print("\n" + "="*50)
print(f"π¬ Input Sequence: {sequence[1]}")
print(f"π Neurotoxicity Probability: {probability:.4f}")
print(f"π·οΈ Prediction: {prediction} (threshold={threshold})")
π Applications
- Neurotoxic peptide filtering in therapeutic design
- Toxicity scanning of synthetic peptides
- Dataset annotation for bioactivity studies
- Educational use in bioinformatics and deep learning for proteins
π Related Links
- π¬ Project Web Server: NTxPred2 Web Tool
- π§Ύ Documentation & Source: GitHub β raghavagps/NTxPred2
π§ Citation
π Rathore et al.
A Large Language Model for Predicting Neurotoxic Peptides and Neurotoxins.
#Coming Soon#
π¨βπ¬ Start using NTxPred2 today to enhance your peptide screening pipeline with the power of transformer-based intelligence!
- Downloads last month
- -
Model tree for raghavagps-group/NTxPred2
Base model
facebook/esm2_t30_150M_UR50D