YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SDG SciBERT Classifier (sdg-scibert-zo_up)

This repository contains a fine-tuned version of allenai/scibert_scivocab_cased for classifying scientific text into Sustainable Development Goal (SDG) categories.

  • Fine-tuned using the πŸ€— transformers Trainer API
  • Uses standard AutoModelForSequenceClassification
  • Published with full label mappings, inference scripts, and CLI tool

πŸ§ͺ Quick Inference (Python)

You can use the model directly with the Hugging Face pipeline:

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="simon-clmtd/sdg-scibert-zo_up",
    tokenizer="simon-clmtd/sdg-scibert-zo_up",
    truncation=True,
    padding=True,
    max_length=512,
    return_all_scores=True,
    device=0  # or -1 for CPU
)

text = "Ensure access to affordable, reliable, sustainable and modern energy for all"
print(classifier(text))

πŸ–₯️ CLI Tool: sdg-predict

πŸ”§ Installation (local)

Clone the repo and install as a Python package:

git clone https://huggingface.co/simon-clmtd/sdg-scibert-zo_up
cd sdg-scibert-zo_up
pip install -e .

This will install a command-line tool called sdg-predict.

πŸ“₯ Input format

The CLI tool accepts a .jsonl file (one JSON object per line). You must specify the key containing the text to classify:

Example input file (input.jsonl):

{"id": 1, "text": "Ensure access to affordable, reliable, sustainable and modern energy for all"}
{"id": 2, "text": "Atmospheric warming is profoundly affecting high-mountain regions"}

▢️ Example usage

Top-1 prediction:

sdg-predict input.jsonl --key text --top1 --output preds.jsonl

Full label distribution:

sdg-predict input.jsonl --key text --output preds_all.jsonl

Custom batch size:

sdg-predict input.jsonl --key text --batch_size 16

πŸ“€ Output format

Each output line is the original input with an added prediction key:

With --top1:

{
  "id": 1,
  "text": "...",
  "prediction": {
    "label": "7", 
    "score": 0.9124
  }
}

Without --top1:

{
  "id": 1,
  "text": "...",
  "prediction": [
    {"label": "1", "score": 0.0021},
    {"label": "2", "score": 0.0005},
    ...
    {"label": "7", "score": 0.9124}
  ]
}

πŸ“¦ Repository Contents

  • modeling.py: Optional class wrapper if extending the base model.
  • inference.py: Reusable batch inference logic for Python scripts.
  • cli_predict.py: CLI tool using the inference logic.
  • requirements.txt: Runtime dependencies.
  • setup.py: Installation and entry point for the CLI.

πŸ” Citation

Please cite the original SciBERT paper if using this model, and attribute this fine-tuning setup if relevant.


πŸ‘€ Author

Simon Clematide
Computational Linguistics, UZH
simon-clematide.net (if applicable)

Downloads last month
36
Safetensors
Model size
110M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support