SPECTER2-base Multilabel Horizon Intervention Areas Classifier
This model is based on SPECTER2-base, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.
Model Description
- Base model: allenai/specter2_base
- Task: Multilabel classification (assigns one or more clusters per document)
- Labels: 36 Horizon Europe intervention areas (see below)
- Languages: English
- Input: Title and abstract concatenated
Training Details
- Training framework: Hugging Face Transformers (
Trainer
) - Batch size: 4
- Learning rate: 2e-5
- Epochs: 6
- Optimizer: AdamW with weight decay 0.01
- Loss: Binary Cross-Entropy with Logits
- Best model selection: F1-score on validation set
Evaluation Metrics
Epoch | Training Loss | Validation Loss | F1 | ROC AUC | Accuracy |
---|---|---|---|---|---|
1 | No log | 0.1823 | 0.280 | 0.582 | 0.093 |
2 | 0.1912 | 0.1605 | 0.493 | 0.676 | 0.151 |
3 | 0.1133 | 0.1390 | 0.593 | 0.732 | 0.236 |
4 | 0.0902 | 0.1316 | 0.644 | 0.762 | 0.281 |
5 | 0.0740 | 0.1221 | 0.697 | 0.791 | 0.334 |
6 | 0.0619 | 0.1216 | 0.722 | 0.809 | 0.374 |
7 | 0.0535 | 0.1204 | 0.741 | 0.820 | 0.382 |
8 | 0.0467 | 0.1195 | 0.750 | 0.826 | 0.414 |
9 | 0.0422 | 0.1188 | 0.759 | 0.830 | 0.430 |
10 | 0.0384 | 0.1184 | 0.765 | 0.834 | 0.435 |
Best epoch: 10 (highest F1/accuracy)
Final validation loss: 0.1184
Final F1: 0.7647
Final ROC AUC: 0.8344
Final Accuracy: 0.4350
Per-Category Classification Report
Label | Precision | Recall | F1-score | Support |
---|---|---|---|---|
Advanced Materials | 1.00 | 0.71 | 0.83 | 38 |
Advanced computing and big data | 0.85 | 0.74 | 0.79 | 39 |
Agriculture, forestry and rural areas | 0.86 | 0.60 | 0.71 | 20 |
Artificial intelligence and robotics | 0.85 | 1.00 | 0.92 | 17 |
Bio-based innovation systems in the bioeconomy | 0.83 | 0.59 | 0.69 | 17 |
Biodiversity and natural resources | 1.00 | 0.14 | 0.25 | 28 |
Buildings and industrial facilities in energy transition | 0.79 | 0.52 | 0.62 | 29 |
Circular Industries | 0.00 | 0.00 | 0.00 | 6 |
Circular systems | 0.52 | 0.74 | 0.61 | 19 |
Clean, safe and accessible transport and mobility | 0.93 | 0.78 | 0.85 | 18 |
Climate science and solutions | 0.82 | 0.80 | 0.81 | 41 |
Communities and cities | 0.87 | 0.61 | 0.72 | 44 |
Culture, cultural heritage and creativity | 0.97 | 0.87 | 0.92 | 45 |
Cybersecurity | 1.00 | 0.33 | 0.50 | 3 |
Democracy and Governance | 1.00 | 0.67 | 0.80 | 18 |
Disaster-resilient societies | 1.00 | 0.48 | 0.65 | 27 |
Emerging enabling technologies | 0.84 | 0.71 | 0.77 | 82 |
Energy storage | 0.00 | 0.00 | 0.00 | 4 |
Energy supply | 0.62 | 0.89 | 0.73 | 37 |
Energy systems and grids | 0.83 | 0.66 | 0.73 | 44 |
Environmental and social health determinants | 1.00 | 0.58 | 0.74 | 12 |
Environmental observation | 1.00 | 0.56 | 0.72 | 16 |
Food systems | 0.00 | 0.00 | 0.00 | 11 |
Health throughout the life course | 1.00 | 0.63 | 0.77 | 19 |
Healthcare systems | 0.96 | 0.83 | 0.89 | 29 |
Industrial competitiveness in transport | 1.00 | 0.20 | 0.33 | 5 |
Infectious diseases, including poverty-related and neglected diseases | 0.93 | 0.93 | 0.93 | 15 |
Key digital technologies | 0.94 | 0.66 | 0.78 | 89 |
Manufacturing technologies | 0.98 | 0.94 | 0.96 | 49 |
Net-zero and less polluting Industries | 0.80 | 0.25 | 0.38 | 16 |
Next generation internet | 0.78 | 0.54 | 0.64 | 13 |
Non-communicable and rare diseases | 0.96 | 1.00 | 0.98 | 22 |
Protection and security | 1.00 | 0.57 | 0.73 | 28 |
Seas, oceans and inland waters | 1.00 | 0.26 | 0.42 | 19 |
Smart mobility | 0.80 | 0.67 | 0.73 | 12 |
Social and economic transformations | 0.90 | 0.64 | 0.75 | 59 |
Space, including Earth observation | 1.00 | 0.55 | 0.71 | 11 |
Tools, technologies and digital solutions for health and care, including personalised medicine | 0.98 | 0.92 | 0.95 | 49 |
micro avg | 0.88 | 0.68 | 0.76 | 1050 |
macro avg | 0.83 | 0.59 | 0.67 | 1050 |
weighted avg | 0.88 | 0.68 | 0.75 | 1050 |
samples avg | 0.85 | 0.72 | 0.76 | 1050 |
License
This model is licensed under the Apache License 2.0.
- Downloads last month
- 31
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nicolauduran45/horizon-intervention_areas-classifier
Base model
allenai/specter2_base