SDG SciBERT Classifier (sdg-scibert-zo_up
)
This repository contains a fine-tuned version of allenai/scibert_scivocab_cased for classifying scientific text into Sustainable Development Goal (SDG) categories.
- Fine-tuned using the π€
transformers
Trainer API - Uses standard
AutoModelForSequenceClassification
- Published with full label mappings, inference scripts, and CLI tool
π§ͺ Quick Inference (Python)
You can use the model directly with the Hugging Face pipeline
:
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="simon-clmtd/sdg-scibert-zo_up",
tokenizer="simon-clmtd/sdg-scibert-zo_up",
truncation=True,
padding=True,
max_length=512,
return_all_scores=True,
device=0 # or -1 for CPU
)
text = "Ensure access to affordable, reliable, sustainable and modern energy for all"
print(classifier(text))
π₯οΈ CLI Tool: sdg-predict
π§ Installation (local)
Clone the repo and install as a Python package:
git clone https://huggingface.co/simon-clmtd/sdg-scibert-zo_up
cd sdg-scibert-zo_up
pip install -e .
This will install a command-line tool called sdg-predict
.
π₯ Input format
The CLI tool accepts a .jsonl
file (one JSON object per line). You must specify the key containing the text to classify:
Example input file (input.jsonl
):
{"id": 1, "text": "Ensure access to affordable, reliable, sustainable and modern energy for all"}
{"id": 2, "text": "Atmospheric warming is profoundly affecting high-mountain regions"}
βΆοΈ Example usage
Top-1 prediction:
sdg-predict input.jsonl --key text --top1 --output preds.jsonl
Full label distribution:
sdg-predict input.jsonl --key text --output preds_all.jsonl
Custom batch size:
sdg-predict input.jsonl --key text --batch_size 16
π€ Output format
Each output line is the original input with an added prediction
key:
With --top1
:
{
"id": 1,
"text": "...",
"prediction": {
"label": "7",
"score": 0.9124
}
}
Without --top1
:
{
"id": 1,
"text": "...",
"prediction": [
{"label": "1", "score": 0.0021},
{"label": "2", "score": 0.0005},
...
{"label": "7", "score": 0.9124}
]
}
π¦ Repository Contents
modeling.py
: Optional class wrapper if extending the base model.inference.py
: Reusable batch inference logic for Python scripts.cli_predict.py
: CLI tool using the inference logic.requirements.txt
: Runtime dependencies.setup.py
: Installation and entry point for the CLI.
π Citation
Please cite the original SciBERT paper if using this model, and attribute this fine-tuning setup if relevant.
π€ Author
Simon Clematide
Computational Linguistics, UZH
simon-clematide.net (if applicable)
- Downloads last month
- 36