Multilingual Bert base (multilingual uncased) model trained to predict CAP issue codes.

Model training on 120,000 assorted political documents -- mostly from the Comparative Agendas Project

Countries:

Italy
Sweden
France
Switzerland
Poland
Netherlands
Germany
Denmark
Spain
UK
Austria
Ireland

LABELS USED IN TRAINING

Model labels -> CAP labels:
{0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}
Model labels -> CAP issues:
{0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}

Validation

Class	Precision	Recall	F1-score	Support
0	0.72	0.83	0.77	211
1	0.82	0.77	0.79	242
2	0.82	0.86	0.84	251
3	0.92	0.89	0.90	228
4	0.81	0.85	0.83	220
5	0.90	0.93	0.91	244
6	0.87	0.87	0.87	230
7	0.92	0.88	0.90	251
8	0.94	0.90	0.92	237
9	0.87	0.88	0.87	263
10	0.70	0.88	0.78	189
11	0.90	0.81	0.85	248
12	0.87	0.90	0.88	222
13	0.76	0.72	0.74	255
14	0.84	0.84	0.84	241
15	0.92	0.79	0.85	276
16	0.95	0.90	0.92	258
17	0.71	0.82	0.76	200
18	0.77	0.73	0.75	215
19	0.92	0.91	0.92	239
Accuracy	--- 0.85 ---
Macro Avg	0.85	0.85	0.85	4720
Weighted Avg	0.85	0.85	0.85	4720

from transformers import AutoModelForSequenceClassification
from transformers import TextClassificationPipeline, AutoTokenizer

mp = 'z-dickson/CAP_multilingual'
model = AutoModelForSequenceClassification.from_pretrained(mp)
tokenizer =  AutoTokenizer.from_pretrained(mp)

classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)

classifier("""
To ask the Secretary of State for Energy and Climate \\
Change what estimate he has made of the proportion of carbon \\
dioxide emissions arising in the UK attributable to burning.
"""
)