Multilingual Bert base (multilingual uncased) model trained to predict CAP issue codes.
Model training on 120,000 assorted political documents -- mostly from the Comparative Agendas Project
Countries:
- Italy
- Sweden
- France
- Switzerland
- Poland
- Netherlands
- Germany
- Denmark
- Spain
- UK
- Austria
- Ireland
LABELS USED IN TRAINING
Model labels -> CAP labels:
{0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}
Model labels -> CAP issues:
{0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}
Validation
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 | 0.72 | 0.83 | 0.77 | 211 |
1 | 0.82 | 0.77 | 0.79 | 242 |
2 | 0.82 | 0.86 | 0.84 | 251 |
3 | 0.92 | 0.89 | 0.90 | 228 |
4 | 0.81 | 0.85 | 0.83 | 220 |
5 | 0.90 | 0.93 | 0.91 | 244 |
6 | 0.87 | 0.87 | 0.87 | 230 |
7 | 0.92 | 0.88 | 0.90 | 251 |
8 | 0.94 | 0.90 | 0.92 | 237 |
9 | 0.87 | 0.88 | 0.87 | 263 |
10 | 0.70 | 0.88 | 0.78 | 189 |
11 | 0.90 | 0.81 | 0.85 | 248 |
12 | 0.87 | 0.90 | 0.88 | 222 |
13 | 0.76 | 0.72 | 0.74 | 255 |
14 | 0.84 | 0.84 | 0.84 | 241 |
15 | 0.92 | 0.79 | 0.85 | 276 |
16 | 0.95 | 0.90 | 0.92 | 258 |
17 | 0.71 | 0.82 | 0.76 | 200 |
18 | 0.77 | 0.73 | 0.75 | 215 |
19 | 0.92 | 0.91 | 0.92 | 239 |
Accuracy | --- 0.85 --- | |||
Macro Avg | 0.85 | 0.85 | 0.85 | 4720 |
Weighted Avg | 0.85 | 0.85 | 0.85 | 4720 |
from transformers import AutoModelForSequenceClassification
from transformers import TextClassificationPipeline, AutoTokenizer
mp = 'z-dickson/CAP_multilingual'
model = AutoModelForSequenceClassification.from_pretrained(mp)
tokenizer = AutoTokenizer.from_pretrained(mp)
classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)
classifier("""
To ask the Secretary of State for Energy and Climate \\
Change what estimate he has made of the proportion of carbon \\
dioxide emissions arising in the UK attributable to burning.
"""
)
- Downloads last month
- 12