JonyC
/

scibert-science-word-classifier

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- allenai/scibert_scivocab_uncased
+tags:
+- Science
+- classifier
+- words
+---
+<b><span style="color:red;">IMPORTENT! READ THIS!</span></b>
+## Model description
+This model recognizes scientific terms in a given *text*. The best way to use it is as follows:
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+from nltk.tokenize import word_tokenize
+import torch
+import spacy
+# You might want to use it to remove enteties in the text (the model usually predicts them as scientific)
+nlp = spacy.load("en_core_web_sm")
+# doc = nlp(text)
+# names = [ent.text for ent in doc.ents]
+tokenizer = AutoTokenizer.from_pretrained("JonyC/scibert-science-word-classifier")
+model = AutoModelForTokenClassification.from_pretrained("JonyC/scibert-science-word-classifier")
+# define max_len as needed.
+def classify_term(term, max_len=12):
+    term = term.lower()
+    tokens = tokenizer(term, return_tensors="pt", truncation=True, padding=True, max_length=max_len).to(device)
+    output = model(**tokens).logits
+    pred = torch.argmax(output).item()
+    return "Scientific" if pred == 1 else "Non-Scientific"
+# For single term:
+print(classify_term("quantum mechanics"))
+print(classify_term("table"))
+print(classify_term("photosynthesis"))
+# For sentences:
+words = word_tokenize("some sentence") # you can also use sentence.split()
+results = []
+for w in words:
+    res = classify_term(w)
+    results.append(res)
+for w, p in zip(words, results):
+    print(f"Word: {w}, Predicted Label: {p}")
+```
+## Example usage
+Given the following text:
+"Quantum computing is a new field that changes how we think about solving complex problems. Unlike regular computers that use bits (which are either 0 or 1), quantum computers use qubits, which can be both 0 and 1 at the same time, thanks to a property called superposition.
+One important feature of quantum computers is quantum entanglement, where two qubits can be linked in such a way that changing one will instantly affect the other, no matter how far apart they are.
+This allows quantum computers to perform certain calculations much faster than traditional computers. For example, quantum computers could one day factor large numbers much faster, which is currently a task that takes regular computers a very long time. However, there are still challenges to overcome, like maintaining the qubits' state long enough to do calculations without errors.
+Scientists are working on ways to fix these errors, which is necessary for quantum computers to work on a large scale and solve real-world problems more efficiently than today's computers."
+the words he classified as scientific are:<br>
+```
+['Quantum', 'computing', 'field', 'complex', 'quantum', 'qubits', 'property', 'superposition', 'entanglement', 'matter', 'factor', 'state', 'scale']
+```
+# results_bert-finetuned-ner
+This model is a fine-tuned version of [allenai/scibert_scivocab_cased](https://huggingface.co/allenai/scibert_scivocab_cased) on the [JonyC/ScienceGlossary](https://huggingface.co/datasets/JonyC/ScienceGlossary) dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1763
+- Precision: 0.9487
+- Recall: 0.9068
+- F1: 0.9273
+- Accuracy: 0.9695
+-
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 7e-05
+- train_batch_size: 128
+- eval_batch_size: 128
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 35