CBDC-Type-BERT: Classifying Retail vs Wholesale vs General CBDC Sentences

A domain-specialized BERT classifier that labels central-bank text about CBDCs into three categories:

Retail CBDC – statements about a general-purpose CBDC for the public (households, merchants, wallets, offline use, legal-tender for everyday payments, holding limits, tiered remuneration, “digital euro/pound/rupee” for citizens, etc.).
Wholesale CBDC – statements about a financial-institution CBDC (RTGS/settlement, DLT platforms, PvP/DvP, tokenised assets/markets, interbank use, central-bank reserves on ledger, etc.).
General/Unspecified – CBDC mentions that don’t clearly indicate retail or wholesale scope, or discuss CBDCs at a conceptual/policy level without specifying the type.

Training data: 1,417 manually annotated CBDC sentences from BIS central-bank speeches — Retail CBDC (543), Wholesale CBDC (329), and General/Unspecified (545) — split 80/10/10 (train/validation/test) with stratification.

Base model: bilalzafar/CentralBank-BERT - CentralBank-BERT is a domain-adapted BERT trained on ~2M sentences (66M tokens) of central bank speeches (BIS, 1996–2024). It captures monetary-policy and payments vocabulary far better than generic BERT, which materially helps downstream CBDC classification.

Preprocessing, Class Weights & Training

Performed light manual cleaning (trimming whitespace, normalizing quotes/dashes, de-duplication, dropping nulls) and tokenized with bilalzafar/CentralBank-BERT’s WordPiece (max length 192). Because Wholesale had fewer examples, we applied inverse-frequency class weights in CrossEntropyLoss to balance learning (train-split weights ≈ General 0.866, Retail 0.870, Wholesale 1.436). The model was fine-tuned with AdamW (lr 2e-5, weight decay 0.01, warmup ratio 0.1), batch sizes 8/16 (train/eval), for 5 epochs with fp16 mixed precision. Early stopping monitored validation macro-F1 (patience = 2), and the best checkpoint was restored at the end. Training ran on a single Colab GPU.

Performance & Evaluation

On a 10% held-out test set, the model achieved 88.7% accuracy, 0.898 macro-F1, and 0.887 weighted-F1. Class-wise, performance was strong across categories, with Retail ≈ 0.86 F1, Wholesale ≈ 0.97 F1, and General ≈ 0.86 F1, indicating particularly high precision/recall on Wholesale, and balanced, reliable performance on Retail and General.

Usage

from transformers import pipeline

model = "bilalezafar/cbdc-type"
clf = pipeline("text-classification", model=model, tokenizer=model,
               truncation=True, max_length=192)

texts = [
    "The digital euro will be available to citizens and merchants for daily payments.",         # Retail
    "DLT-based interbank settlement with a central bank liability will lower PvP risk.",       # Wholesale
    "Several central banks are assessing CBDCs to modernise payments and policy transmission." # General
]

for t in texts:
    out = clf(t)[0]
    print(f"{out['label']:>20}  {out['score']:.3f}  |  {t}")

bilalzafar
/

CBDC-Type

CBDC-Type-BERT: Classifying Retail vs Wholesale vs General CBDC Sentences

Preprocessing, Class Weights & Training

Performance & Evaluation

Usage

Model tree for bilalzafar/CBDC-Type