BERT-Breaks (v0) – Coming Soon 🚧

Status: Model training and evaluation planned – baseline placeholder repository.

Overview

BERT-Breaks-v0 serves as the vanilla BERT baseline for the Exception Handling & Reconciliation project.
It will be trained on the same corpus as our DistilBERT-Reconciler – 3.2M labeled post-trade break descriptions and resolution actions – but using the original bert-base-uncased architecture.

The goal is to provide a performance benchmark against which lightweight and distilled models can be evaluated.

Intended Use

Automated classification of reconciliation exceptions in fixed-income settlement workflows (CUSIP/ISIN).
The model will output a label_id mapped to a human-readable root-cause and recommended resolution step.

Planned Training Details

Base: bert-base-uncased
Epochs: TBD (expected 3–5)
Learning Rate: TBD (expected ~3e-5)
Max Length: 256
Dataset: Proprietary + ISO 20022-derived corpus (post-trade break descriptions)
Split: 80% train / 20% hold-out
Evaluation Metrics: Accuracy, Micro-F1, Macro-F1

Expected Benchmark

Model	Accuracy	Micro-F1	Macro-F1
DistilBERT-Reconciler	0.88	0.88	0.85
BERT-Breaks-v0	(Coming)	(Coming)	(Coming)

Limitations & Bias

Labels are derived from North-American corporate-bond desks (2019–2025).
May under-perform on equities, repos, or non-USD instruments without re-training.
Baseline model is expected to have larger inference latency compared to distilled variants.

Citation

Musodza, K. (2025). Bond Settlement Automated Exception Handling and Reconciliation. Zenodo. https://doi.org/10.5281/zenodo.16828730

Related Models

DistilBERT-Reconciler – Fine-tuned lightweight alternative.
Streaming-fail-forecaster – Next-day settlement-fail forecasting models.
settlement-stress-flagger-v1 – CUSIP-level stress-event classifier.