SPECTER2-base Multilabel Horizon Intervention Areas Classifier

This model is based on SPECTER2-base, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.


Model Description

  • Base model: allenai/specter2_base
  • Task: Multilabel classification (assigns one or more clusters per document)
  • Labels: 36 Horizon Europe intervention areas (see below)
  • Languages: English
  • Input: Title and abstract concatenated

Training Details

  • Training framework: Hugging Face Transformers (Trainer)
  • Batch size: 4
  • Learning rate: 2e-5
  • Epochs: 6
  • Optimizer: AdamW with weight decay 0.01
  • Loss: Binary Cross-Entropy with Logits
  • Best model selection: F1-score on validation set

Evaluation Metrics

Epoch Training Loss Validation Loss F1 ROC AUC Accuracy
1 No log 0.1823 0.280 0.582 0.093
2 0.1912 0.1605 0.493 0.676 0.151
3 0.1133 0.1390 0.593 0.732 0.236
4 0.0902 0.1316 0.644 0.762 0.281
5 0.0740 0.1221 0.697 0.791 0.334
6 0.0619 0.1216 0.722 0.809 0.374
7 0.0535 0.1204 0.741 0.820 0.382
8 0.0467 0.1195 0.750 0.826 0.414
9 0.0422 0.1188 0.759 0.830 0.430
10 0.0384 0.1184 0.765 0.834 0.435

Best epoch: 10 (highest F1/accuracy)

Final validation loss: 0.1184

Final F1: 0.7647

Final ROC AUC: 0.8344

Final Accuracy: 0.4350


Per-Category Classification Report

Label Precision Recall F1-score Support
Advanced Materials 1.00 0.71 0.83 38
Advanced computing and big data 0.85 0.74 0.79 39
Agriculture, forestry and rural areas 0.86 0.60 0.71 20
Artificial intelligence and robotics 0.85 1.00 0.92 17
Bio-based innovation systems in the bioeconomy 0.83 0.59 0.69 17
Biodiversity and natural resources 1.00 0.14 0.25 28
Buildings and industrial facilities in energy transition 0.79 0.52 0.62 29
Circular Industries 0.00 0.00 0.00 6
Circular systems 0.52 0.74 0.61 19
Clean, safe and accessible transport and mobility 0.93 0.78 0.85 18
Climate science and solutions 0.82 0.80 0.81 41
Communities and cities 0.87 0.61 0.72 44
Culture, cultural heritage and creativity 0.97 0.87 0.92 45
Cybersecurity 1.00 0.33 0.50 3
Democracy and Governance 1.00 0.67 0.80 18
Disaster-resilient societies 1.00 0.48 0.65 27
Emerging enabling technologies 0.84 0.71 0.77 82
Energy storage 0.00 0.00 0.00 4
Energy supply 0.62 0.89 0.73 37
Energy systems and grids 0.83 0.66 0.73 44
Environmental and social health determinants 1.00 0.58 0.74 12
Environmental observation 1.00 0.56 0.72 16
Food systems 0.00 0.00 0.00 11
Health throughout the life course 1.00 0.63 0.77 19
Healthcare systems 0.96 0.83 0.89 29
Industrial competitiveness in transport 1.00 0.20 0.33 5
Infectious diseases, including poverty-related and neglected diseases 0.93 0.93 0.93 15
Key digital technologies 0.94 0.66 0.78 89
Manufacturing technologies 0.98 0.94 0.96 49
Net-zero and less polluting Industries 0.80 0.25 0.38 16
Next generation internet 0.78 0.54 0.64 13
Non-communicable and rare diseases 0.96 1.00 0.98 22
Protection and security 1.00 0.57 0.73 28
Seas, oceans and inland waters 1.00 0.26 0.42 19
Smart mobility 0.80 0.67 0.73 12
Social and economic transformations 0.90 0.64 0.75 59
Space, including Earth observation 1.00 0.55 0.71 11
Tools, technologies and digital solutions for health and care, including personalised medicine 0.98 0.92 0.95 49
micro avg 0.88 0.68 0.76 1050
macro avg 0.83 0.59 0.67 1050
weighted avg 0.88 0.68 0.75 1050
samples avg 0.85 0.72 0.76 1050

License

This model is licensed under the Apache License 2.0.

Downloads last month
31
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nicolauduran45/horizon-intervention_areas-classifier

Finetuned
(22)
this model

Dataset used to train nicolauduran45/horizon-intervention_areas-classifier