AmelieSchreiber
/

esm2_t6_8M_UR50D-finetuned-secondary-structure

Token Classification

protein language model

protein token classification

secondary structure prediction

Model card Files Files and versions Community

AmelieSchreiber commited on Aug 6, 2023

Commit

0a63d8c

·

1 Parent(s): ec558c8

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: mit
+language:
+- en
+library_name: transformers
+tags:
+- esm
+- esm2
+- protein language model
+- biology
+- protein token classification
+- secondary structure prediction
+---
+# ESM-2 () for Token Classification
+## Using the Model
+To use, try running:
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+import numpy as np
+# 1. Prepare the Model and Tokenizer
+#  Replace with the path where your trained model is saved if you're training a new model
+model_dir = "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-secondary-structure"
+model = AutoModelForTokenClassification.from_pretrained(model_dir)
+tokenizer = AutoTokenizer.from_pretrained(model_dir)
+# Define a mapping from label IDs to their string representations
+label_map = {0: "Other", 1: "Helix", 2: "Strand"}
+# 2. Tokenize the New Protein Sequence
+new_protein_sequence = "MAVPETRPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVSRSLKMRGQAFVIFKEVSSATNALRSMQGFPFYDKPMRIQYAKTDSDIIAKMKGT"  # Replace with your protein sequence
+tokens = tokenizer.tokenize(new_protein_sequence)
+inputs = tokenizer.encode(new_protein_sequence, return_tensors="pt")
+# 3. Predict with the Model
+with torch.no_grad():
+    outputs = model(inputs).logits
+    predictions = np.argmax(outputs[0].numpy(), axis=1)
+# 4. Decode the Predictions
+predicted_labels = [label_map[label_id] for label_id in predictions]
+# Print the tokens along with their predicted labels
+for token, label in zip(tokens, predicted_labels):
+    print(f"{token}: {label}")
+```