You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧠 NTxPred2: A large language model for predicting neurotoxic peptides and neurotoxins

NTxPred2 is a fine-tuned transformer model built on top of the ESM2-t30_150M_UR50D protein language model. It is specifically trained for binary classification of peptide sequences β€” predicting whether a peptide is neurotoxic or non-toxic.

🎯 Use Case: Accelerating the identification and design of safe peptide therapeutics by filtering out neurotoxic candidates early in the drug development pipeline.


πŸ–ΌοΈ NTxPred2 Workflow

NTxPred2 Workflow


🧬 Model Highlights

  • Base Model: Facebook’s ESM2-t30 (150M parameters)
  • Fine-Tuning Task: Neurotoxicity prediction (binary classification)
  • Input: Short peptide sequences (7–50 amino acids)
  • Output: Binary label β†’ 1 (neurotoxic), 0 (non-toxic)
  • Architecture: ESM2 encoder + linear classification head

πŸ—‚οΈ Files Included

  • config.json – Contains configuration settings for the model architecture, hyperparameters, and training details.

  • model.safetensors – This is the actual trained model weights saved in the SafeTensors format, which is safer and faster than the traditional .bin files.

  • special_tokens_map.json – Stores mappings for special tokens, like [CLS], [SEP], or any custom tokens used in your tokenizer.

  • tokenizer_config.json – Contains tokenizer-related settings (like vocabulary size, tokenization method).

  • vocab.txt – Lists all tokens and their corresponding IDs; it's essential for text tokenization.


πŸš€ How to Use

πŸ”§ Install Dependencies

pip install torch esm biopython huggingface_hub


### Loading the Model from Hugging Face

```python
import torch
import torch.nn as nn
import esm
import json
from huggingface_hub import hf_hub_download

# Define the classifier model (ESM encoder + linear head)
class ProteinClassifier(nn.Module):
    def __init__(self, esm_model, embedding_dim, num_classes):
        super(ProteinClassifier, self).__init__()
        self.esm_model = esm_model
        self.fc = nn.Linear(embedding_dim, num_classes)

    def forward(self, tokens):
        layer_index = len(self.esm_model.layers)  # Get number of layers
        results = self.esm_model(tokens, repr_layers=[layer_index])
        embeddings = results["representations"][layer_index].mean(1)
        return self.fc(embeddings)

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load config from your repo
config_path = hf_hub_download(repo_id="anandr88/NTxPred2", filename="config.json")
with open(config_path, 'r') as f:
    config = json.load(f)

# Load ESM2 model - UPDATED METHOD
model_name = "esm2_t30_150M_UR50D"
esm_model, alphabet = esm.pretrained.load_model_and_alphabet(model_name)
batch_converter = alphabet.get_batch_converter()

# Initialize a NEW classifier (with random weights)
classifier = ProteinClassifier(
    esm_model, 
    embedding_dim=config['embedding_dim'], 
    num_classes=config['num_classes']
)
classifier.to(device)
classifier.eval()

print("βœ… Model loaded successfully!")
print(f"Using device: {device}")
print(f"Model architecture: {classifier}")

πŸ§ͺ Example Usage (Optional)


# Example Usage for Binary Classification
sequence = ("TEST_SEQUENCE", "ACDEFGHIKLMNPQRSTVWY")  # Your peptide sequence

# Convert to model input format
_, _, batch_tokens = batch_converter([sequence])
batch_tokens = batch_tokens.to(device)

# Predict
with torch.no_grad():
    logits = classifier(batch_tokens)
    probability = torch.sigmoid(logits).item()  # Sigmoid for binary classification

# Interpret results
threshold = 0.5  # Standard threshold (adjust if needed)
prediction = "Neurotoxic" if probability >= threshold else "Not-toxic"

print("\n" + "="*50)
print(f"πŸ”¬ Input Sequence: {sequence[1]}")
print(f"πŸ“Š Neurotoxicity Probability: {probability:.4f}")
print(f"🏷️ Prediction: {prediction} (threshold={threshold})")

πŸ“Š Applications

  • Neurotoxic peptide filtering in therapeutic design
  • Toxicity scanning of synthetic peptides
  • Dataset annotation for bioactivity studies
  • Educational use in bioinformatics and deep learning for proteins

🌐 Related Links


🧠 Citation

πŸ“– Rathore et al.
A Large Language Model for Predicting Neurotoxic Peptides and Neurotoxins.
#Coming Soon#


πŸ‘¨β€πŸ”¬ Start using NTxPred2 today to enhance your peptide screening pipeline with the power of transformer-based intelligence!

Downloads last month
-
Safetensors
Model size
149M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for raghavagps-group/NTxPred2

Finetuned
(12)
this model