nicolauduran45's picture
Update README.md
c30b149 verified
metadata
library_name: transformers
license: apache-2.0
base_model:
  - allenai/specter2_base
pipeline_tag: text-classification
datasets:
  - nicolauduran45/horizon_clusters_annotated

SPECTER2-base Multilabel Horizon Clusters Classifier

This model is based on SPECTER2-base, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.


Model Description

  • Base model: allenai/specter2_base
  • Task: Multilabel classification (assigns one or more clusters per document)
  • Labels: 6 Horizon Europe clusters (see below)
  • Languages: English
  • Input: Title and abstract concatenated

Training Details

  • Training framework: Hugging Face Transformers (Trainer)
  • Batch size: 4
  • Learning rate: 2e-5
  • Epochs: 6
  • Optimizer: AdamW with weight decay 0.01
  • Loss: Binary Cross-Entropy with Logits
  • Best model selection: F1-score on validation set

Clusters (Labels)

  • Civil Security for Society
  • Climate, Energy and Mobility
  • Culture, Creativity and Inclusive Society
  • Digital, Industry and Space
  • Food, Bioeconomy, Natural Resources, Agriculture and Environment
  • Health

Evaluation Metrics

Epoch Training Loss Validation Loss F1 ROC AUC Accuracy
1 No log 0.1774 0.910 0.9368 0.766
2 0.0606 0.1849 0.921 0.9454 0.787
3 0.0351 0.2071 0.919 0.9434 0.787
4 0.0180 0.2191 0.921 0.9451 0.793
5 0.0093 0.2295 0.921 0.9451 0.793
6 0.0060 0.2307 0.921 0.9451 0.793

Best epoch: 6 (highest F1 and accuracy, last improvement at epoch 4)

  • Final validation loss: 0.2307
  • Final F1: 0.9212
  • Final ROC AUC: 0.9451
  • Final Accuracy: 0.7927

Per-Category Classification Report

Label Precision Recall F1-score Support
Civil Security for Society 0.97 0.79 0.87 39
Climate, Energy and Mobility 0.94 0.91 0.93 91
Culture, Creativity and Inclusive Society 0.89 0.88 0.88 96
Digital, Industry and Space 0.93 0.92 0.93 214
Food, Bioeconomy, Natural Resources, Agriculture and Environment 0.89 0.97 0.93 75
Health 0.96 0.96 0.96 73
micro avg 0.93 0.91 0.92 588
macro avg 0.93 0.91 0.92 588
weighted avg 0.93 0.91 0.92 588
samples avg 0.91 0.92 0.90 588

License

This model is licensed under the Apache License 2.0.