File size: 8,003 Bytes
b0ab2f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3635cbc
 
 
 
 
 
b0ab2f3
 
 
 
 
 
 
 
2cc47b0
b0ab2f3
 
 
d47e380
3b92201
 
 
d47e380
 
b0ab2f3
 
 
 
 
 
 
 
 
3b92201
 
b0ab2f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b92201
b0ab2f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6375307
 
b0ab2f3
 
3b92201
b0ab2f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b92201
 
 
 
 
 
 
 
 
 
 
 
b0ab2f3
 
 
 
 
 
 
 
 
 
 
 
 
3b92201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0ab2f3
 
 
3b92201
 
 
 
 
 
b0ab2f3
 
 
 
 
 
 
3b92201
3635cbc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
---
license: apache-2.0
base_model: bert-base-uncased
tags:
- text-classification
- industrial-policy
- economics
- policy-analysis
- bert
- government-policy
- trade-policy
language:
- en
pipeline_tag: text-classification
widget:
- text: "Government provides subsidies to promote renewable energy development"
  example_title: "IP goal Example"
- text: "Company announces quarterly earnings report"
  example_title: "No IP goal Example"
- text: "The document mentions policy changes"
  example_title: "Not enough information Example"
metrics:
- accuracy
- f1
- precision
- recall
library_name: transformers
---

# Industrial Policy Classification Model v1.0

This model classifies text documents to determine whether they describe industrial policy goals. It was fine-tuned from bert-base-uncased on a dataset of policy documents and measures.

Accompanies the paper:

Juhász, Réka, Lane, Nathan J., Oehlsen, Emily, and Perez, Veronica C. (2025). Measuring Industrial Policy: A Text-Based Approach. National Bureau of Economic Research. Available at: https://www.nber.org/papers/w33895

The output data is available at: industrialpolicydata.com

## Model Description

This is a BERT-based text classification model trained to identify industrial policy intentions in text. The model can classify text into 3 categories:

- **IP goal** (0): Text describes an industrial policy objective or intervention
- **No IP goal** (1): Text does not describe an industrial policy objective
- **Not enough information** (2): Insufficient information to determine policy intent


The model was trained on expert-annotated policy documents. The input data for this project was provided in 2023 by the Global Trade Alerts project. See the Global Trade Alert (2025) data Available at: https://www.globaltradealert.org/

## Intended Use

This model is designed for research purposes to analyze policy documents, government measures, and related texts to identify industrial policy intentions. It can be used by:

- Economics researchers studying industrial policy
- Policy analysts examining government interventions  
- Data scientists working with policy text classification
- Government agencies analyzing policy effectiveness

## Model Performance

- **Accuracy**: 0.941
- **F1 Score**: 0.941
- **Precision**: 0.941
- **Recall**: 0.941
- **Test Loss**: 0.2886

*Metrics evaluated on held-out test set*

## Training Data

The model was trained on expert-annotated policy documents. The input data for this project was provided by the Global Trade Alerts project.

## Training Procedure

### Model Architecture
- **Base model**: bert-base-uncased
- **Architecture**: BertForSequenceClassification
- **Number of labels**: 3
- **Fine-tuning approach**: Full model fine-tuning with classification head

### Training Configuration
- **Optimization**: Hyperparameter tuning using Optuna for optimal performance
- **Data balancing**: Oversampling applied to handle class imbalance
- **Validation strategy**: Stratified splits with income-based validation
- **Cross-validation**: Income group validation to test generalization

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load model and tokenizer
model_name = "industrialpolicygroup/industrialpolicy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create classification pipeline
classifier = pipeline("text-classification", 
                     model=model, 
                     tokenizer=tokenizer)

# Example usage
text = "Government provides subsidies to promote renewable energy development"
result = classifier(text)
print(result)

# Expected output format:
# [{'label': 'LABEL_0', 'score': 0.95}]
# 
# Label mappings:

```

## Limitations and Bias

- The model is trained primarily on English text from the Global Trade Alerts project
- Performance may vary on policy domains not well-represented in training data
- The model reflects the annotation guidelines and may not capture all nuances of industrial policy
- Bias towards certain types of policy language present in training data
- May require domain adaptation for highly specialized policy areas

## Evaluation and Validation

The model underwent rigorous evaluation including:
- Standard train/validation/test splits
- Income-based validation across country groups
- Cross-domain evaluation on different policy types
- Comparison with traditional machine learning baselines

## Ethical Considerations

This model is intended for research and analysis purposes. Users should be aware that:
- Policy classification can have implications for economic research and policy recommendations
- The model's outputs should be interpreted by domain experts
- Results should be validated against human expert judgment for critical applications

## Citation

If you use this model in your research, please cite:

```bibtex
@techreport{NBERw33895,
 title = "Measuring Industrial Policy: A Text-Based Approach",
 author = "Juhász, Réka and Lane, Nathan J and Oehlsen, Emily and Perez, Veronica C",
 institution = "National Bureau of Economic Research",
 type = "Working Paper",
 series = "Working Paper Series",
 number = "33895",
 year = "2025",
 month = "June",
 doi = {10.3386/w33895},
 URL = "http://www.nber.org/papers/w33895",
 abstract = {Since the 18th century, policymakers have debated the merits of industrial policy (IP). Yet, economists lack basic facts about its use due to measurement challenges. We propose a new approach to IP measurement based on information contained in policy text. We show how off-the-shelf supervised machine learning tools can be used to categorize industrial policies at scale. Using this approach, we validate longstanding concerns with earlier approaches to measurement which conflate IP with other types of policy. We apply our methodology to a global database of commercial policy descriptions, and provide a first look at IP use at the country, industry, and year levels (2010-2022). The new data on IP suggest that i) IP is on the rise; ii) modern IP tends to use subsidies and export promotion measures as opposed to tariffs; iii) rich countries heavily dominate IP use; iv) IP tends to target sectors with an established comparative advantage, particularly in high-income countries.},
}
```

## Model Details

- **Developed by**: Industrial Policy Group
- **Model type**: Text Classification (BERT-based)
- **Language**: English
- **License**: Apache 2.0
- **Fine-tuned from**: bert-base-uncased

## Technical Specifications

### Architecture Details
- **Model Type**: BERT
- **Architecture Class**: BertForSequenceClassification
- **Transformers Version**: 4.52.4

### Model Dimensions
- **Vocabulary Size**: 30,522
- **Hidden Size**: 768
- **Number of Attention Heads**: 12
- **Number of Hidden Layers**: 12
- **Intermediate Size**: 3,072
- **Max Position Embeddings**: 512

### Training Configuration
- **Hidden Dropout Probability**: 0.1
- **Attention Dropout Probability**: 0.1
- **Layer Norm Epsilon**: 1e-12
- **Initializer Range**: 0.02

### Classification Configuration
- **Number of Labels**: Unknown
- **Problem Type**: single_label_classification
- **Padding Token ID**: 0
- **Position Embedding Type**: absolute
- **Torch Dtype**: float32
- **Use Cache**: True

### Model Size and Requirements
- **Model Size**: ~109M parameters (~418MB on disk)
- **Input**: Text (up to 512 tokens)
- **Output**: Classification probabilities for 3 classes
- **Framework**: PyTorch + Transformers
- **Precision**: float32

## Citations for source data

Global Trade Alert (2025). Global Trade Alert Database. Available at: https://www.globaltradealert.org/


## Contact

For questions about this model or the research, please contact the Industrial Policy Group.

---

*Model card auto-generated on 2025-06-19 14:07:03 from model files*
*Source model: bert-base-uncased-3_classes-finetuned_hub_ready_20250617_151525*