SPECTER2-base Multilabel Horizon Clusters Classifier

This model is based on SPECTER2-base, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.


Model Description

  • Base model: allenai/specter2_base
  • Task: Multilabel classification (assigns one or more clusters per document)
  • Labels: 6 Horizon Europe clusters (see below)
  • Languages: English
  • Input: Title and abstract concatenated

Training Details

  • Training framework: Hugging Face Transformers (Trainer)
  • Batch size: 4
  • Learning rate: 2e-5
  • Epochs: 6
  • Optimizer: AdamW with weight decay 0.01
  • Loss: Binary Cross-Entropy with Logits
  • Best model selection: F1-score on validation set

Clusters (Labels)

  • Civil Security for Society
  • Climate, Energy and Mobility
  • Culture, Creativity and Inclusive Society
  • Digital, Industry and Space
  • Food, Bioeconomy, Natural Resources, Agriculture and Environment
  • Health

Evaluation Metrics

Epoch Training Loss Validation Loss F1 ROC AUC Accuracy
1 No log 0.1774 0.910 0.9368 0.766
2 0.0606 0.1849 0.921 0.9454 0.787
3 0.0351 0.2071 0.919 0.9434 0.787
4 0.0180 0.2191 0.921 0.9451 0.793
5 0.0093 0.2295 0.921 0.9451 0.793
6 0.0060 0.2307 0.921 0.9451 0.793

Best epoch: 6 (highest F1 and accuracy, last improvement at epoch 4)

  • Final validation loss: 0.2307
  • Final F1: 0.9212
  • Final ROC AUC: 0.9451
  • Final Accuracy: 0.7927

Per-Category Classification Report

Label Precision Recall F1-score Support
Civil Security for Society 0.97 0.79 0.87 39
Climate, Energy and Mobility 0.94 0.91 0.93 91
Culture, Creativity and Inclusive Society 0.89 0.88 0.88 96
Digital, Industry and Space 0.93 0.92 0.93 214
Food, Bioeconomy, Natural Resources, Agriculture and Environment 0.89 0.97 0.93 75
Health 0.96 0.96 0.96 73
micro avg 0.93 0.91 0.92 588
macro avg 0.93 0.91 0.92 588
weighted avg 0.93 0.91 0.92 588
samples avg 0.91 0.92 0.90 588

License

This model is licensed under the Apache License 2.0.

Downloads last month
24
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nicolauduran45/horizon-clusters-classifier

Finetuned
(22)
this model

Dataset used to train nicolauduran45/horizon-clusters-classifier