PoliBERT-MY Model Card

Model Name: PoliBERT-MY Base Model: BERT-base (uncased)
Task: Multi-label, multi-class classification of Malaysian political texts
Output: For each input text, the model classifies 12 topics with 4 possible labels (unknown, negative, neutral, positive).

Model Overview

PoliBERT-MY is a fine-tuned BERT-base model designed to classify political documents and news articles from Malaysia. It outputs predictions on 12 distinct topics:

Democracy
Economy
Race
Leadership
Development
Corruption
Political Instability
Safety
Administration
Education
Religion
Environment

For each topic, the model assigns one of four sentiment labels: unknown, negative, neutral, or positive.

Intended Use

Political Analysis: Extracts topic-specific sentiment from Malaysian news articles and online comments.
Media Monitoring: Automatically categorizes news and social media content to identify political trends and biases.
Research: Serves as a case study for multi-label, multi-class classification in a politically sensitive domain.

Data Sources

The training data was aggregated from multiple sources:

Data Source	N	Labeling Method
English Newspaper	5912	BERT (MyPoliBERT-ver03 was used)
English Newspaper Comments (Facebook)	8471	BERT
Malay Newspaper	5254	OpenAI API (translated to English then classified)
Chinese Newspaper	2480	OpenAI API (translated to English then classified)
Tamil Newspaper	1512	OpenAI API (translated to English then classified)
Reddit	20000	BERT (MyPoliBERT-ver03 was used)
Manifesto BN	98	OpenAI API
Manifesto PH	180	OpenAI API
Manifesto PN	15	OpenAI API
Synthetic Data	4124	OpenAI API

NOTE: The originally aggregated dataset, which included data from various sources (such as English Newspapers, Facebook comments, Malay, Chinese, and Tamil Newspapers, Reddit, Manifestos, and Synthetic Data), contained some noise and misclassifications; after removing these noisy entries, 47,966 clean data points were used for training.

Labeling Method Details

BERT-based Labeling

Method: For primarily English news articles and Facebook comments, labeling was performed using BERT.
Implementation: The YagiASAFAS/MyPoliBERT-ver03 model was used to classify the texts directly.

OpenAI API Labeling

Method: For non-English news articles (Malay, Chinese, Tamil), texts were first translated into English and then labeled.
Process:
- Translation: A translation prompt was used to convert non-English texts into English.
- Classification: After translation, a classification prompt was used to assign labels.
Additional Details:
OpenAI API labeling was performed by combining Human-in-the-loop machine learning—where prompt engineering was applied to select the most accurate prompt—with the OpenAI API (gpt-4o-mini) to generate labels.

Synthetic Data via Data Augmentation

Method: Synthetic data was generated to balance the dataset by augmenting underrepresented labels or sentiments.
Implementation: The OpenAI API was used (in combination with Human-in-the-loop prompt engineering) to generate artificial data that is either not present in the original dataset or is scarce. This synthetic data was then mixed with the original data to improve label balance.

Training Details

Hyperparameters:

Learning Rate: 5e-05
Train Batch Size: 16
Eval Batch Size: 16
Seed: 42
Gradient Accumulation Steps: 4 (Total Train Batch Size = 16 × 4 = 64)
Optimizer: ADAMW_TORCH (betas=(0.9, 0.999), epsilon=1e-08)
LR Scheduler Type: Linear
LR Warmup Steps: 500
Number of Epochs: 5
Mixed Precision Training: Native AMP

Label Imbalance Correction:
A correction factor was computed for each topic based on the number of non-'unknown' samples to mitigate label imbalance. The correction weight for each topic was calculated as:
weight = (average non-unknown count) / (non-unknown count for the topic)

Evaluation Results

The model achieved the following results on the evaluation set:

Loss: 0.1928
Democracy: F1 = 0.9556, Accuracy = 0.9574
Economy: F1 = 0.9352, Accuracy = 0.9381
Race: F1 = 0.9569, Accuracy = 0.9580
Leadership: F1 = 0.8411, Accuracy = 0.8457
Development: F1 = 0.9222, Accuracy = 0.9269
Corruption: F1 = 0.9611, Accuracy = 0.9627
Instability: F1 = 0.9462, Accuracy = 0.9492
Safety: F1 = 0.9213, Accuracy = 0.9258
Administration: F1 = 0.9367, Accuracy = 0.9412
Education: F1 = 0.9661, Accuracy = 0.9678
Religion: F1 = 0.9590, Accuracy = 0.9598
Environment: F1 = 0.9808, Accuracy = 0.9821
Overall: F1 = 0.9402, Accuracy = 0.9429

Training Results by Epoch

Training Loss	Epoch	Step	Validation Loss	Democracy F1	Democracy Accuracy	Economy F1	Economy Accuracy	Race F1	Race Accuracy	Leadership F1	Leadership Accuracy	Development F1	Development Accuracy	Corruption F1	Corruption Accuracy	Instability F1	Instability Accuracy	Safety F1	Safety Accuracy	Administration F1	Administration Accuracy	Education F1	Education Accuracy	Religion F1	Religion Accuracy	Environment F1	Environment Accuracy	Overall F1	Overall Accuracy
0.2762	1.0	600	0.2618	0.9216	0.9410	0.8961	0.9121	0.9179	0.9339	0.7244	0.7770	0.8460	0.8856	0.9274	0.9416	0.8918	0.9236	0.8792	0.8998	0.8800	0.9163	0.9518	0.9588	0.9355	0.9454	0.9718	0.9757	0.8953	0.9176
0.2	2.0	1200	0.2052	0.9428	0.9518	0.9226	0.9292	0.9507	0.9542	0.7889	0.8134	0.8957	0.9128	0.9551	0.9587	0.9396	0.9465	0.9130	0.9185	0.9296	0.9375	0.9648	0.9664	0.9558	0.9577	0.9799	0.9817	0.9282	0.9357
0.1426	3.0	1800	0.1916	0.9538	0.9574	0.9318	0.9351	0.9564	0.9582	0.8296	0.8378	0.9163	0.9235	0.9586	0.9591	0.9468	0.9484	0.9200	0.9230	0.9331	0.9393	0.9648	0.9673	0.9582	0.9589	0.9826	0.9838	0.9377	0.9410
0.103	4.0	2400	0.1908	0.9548	0.9579	0.9348	0.9364	0.9570	0.9582	0.8368	0.8416	0.9214	0.9261	0.9615	0.9627	0.9460	0.9491	0.9209	0.9253	0.9370	0.9418	0.9675	0.9690	0.9602	0.9607	0.9809	0.9820	0.9399	0.9426
0.0838	4.9921	2995	0.1928	0.9556	0.9574	0.9352	0.9381	0.9569	0.9580	0.8411	0.8457	0.9222	0.9269	0.9611	0.9627	0.9462	0.9492	0.9213	0.9258	0.9367	0.9412	0.9661	0.9678	0.9590	0.9598	0.9808	0.9821	0.9402	0.9429

Usage

Inference

Input: English text (or text translated into English)
Output: A JSON object with 12 keys (one for each topic) containing one of the labels: unknown, negative, neutral, or positive.
The model selects the sentiment with the highest probability for each topic.

Fine-Tuning with Best Hyperparameters

After hyperparameter search, update your training arguments using the best hyperparameters and reinitialize the Trainer:

# After hyperparameter search:
best_run = trainer.hyperparameter_search(direction='maximize', hp_space=hp_space, n_trials=5)
print('Best hyperparameters:', best_run.hyperparameters)

# Update TrainingArguments accordingly
training_args.learning_rate = best_run.hyperparameters['learning_rate']
training_args.num_train_epochs = best_run.hyperparameters['num_train_epochs']
training_args.gradient_accumulation_steps = best_run.hyperparameters['gradient_accumulation_steps']

# Reinitialize the Trainer with the updated arguments
trainer = CustomTrainer(
    model_init=model_init,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[early_stopping_callback],
    label_weights_dict=label_weights
)

trainer.train()

YagiASAFAS
/

PoliBERT-MY