AmelieSchreiber commited on
Commit
0a63d8c
·
1 Parent(s): ec558c8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - esm
8
+ - esm2
9
+ - protein language model
10
+ - biology
11
+ - protein token classification
12
+ - secondary structure prediction
13
+ ---
14
+
15
+ # ESM-2 () for Token Classification
16
+
17
+
18
+ ## Using the Model
19
+
20
+ To use, try running:
21
+ ```python
22
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
23
+ import numpy as np
24
+
25
+ # 1. Prepare the Model and Tokenizer
26
+ # Replace with the path where your trained model is saved if you're training a new model
27
+ model_dir = "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-secondary-structure"
28
+
29
+ model = AutoModelForTokenClassification.from_pretrained(model_dir)
30
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
31
+
32
+ # Define a mapping from label IDs to their string representations
33
+ label_map = {0: "Other", 1: "Helix", 2: "Strand"}
34
+
35
+ # 2. Tokenize the New Protein Sequence
36
+ new_protein_sequence = "MAVPETRPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVSRSLKMRGQAFVIFKEVSSATNALRSMQGFPFYDKPMRIQYAKTDSDIIAKMKGT" # Replace with your protein sequence
37
+ tokens = tokenizer.tokenize(new_protein_sequence)
38
+ inputs = tokenizer.encode(new_protein_sequence, return_tensors="pt")
39
+
40
+ # 3. Predict with the Model
41
+ with torch.no_grad():
42
+ outputs = model(inputs).logits
43
+ predictions = np.argmax(outputs[0].numpy(), axis=1)
44
+
45
+ # 4. Decode the Predictions
46
+ predicted_labels = [label_map[label_id] for label_id in predictions]
47
+
48
+ # Print the tokens along with their predicted labels
49
+ for token, label in zip(tokens, predicted_labels):
50
+ print(f"{token}: {label}")
51
+ ```