Token Classification
ONNX
Safetensors
English
bert

gravitee-io/bert-small-pii-detection πŸš€

This model is based on prajjwal1/bert-small, a distilled and efficient version of BERT. It features 4 encoder layers, 512 hidden dimensions, and 8 attention heads β€” making it significantly lighter and faster than bert-base-uncased while retaining a strong performance on downstream tasks.

The original bert-small model was pre-trained on English corpora following standard masked language modeling objectives. It is particularly suitable for real-time inference use cases and edge deployments where computational efficiency is critical.

This model has been fine-tuned for Named Entity Recognition (NER) on synthetic multilingual financial PII data from the Gretel.ai dataset. The focus is on detecting sensitive personal information across financial contexts, optimized for English (language: en).


Evaluation Methodology

Entity Mappings and Label Adjustments

As part of the fine-tuning process, the entity set was intentionally modified and optimized to better reflect our business-specific requirements and real-world usage scenarios.

Key adjustments include:

  • Merging or splitting certain entity types to improve classification performance,
  • Renaming labels for consistency and clarity in downstream applications,
  • Adding or removing entities based on their relevance to financial Personally Identifiable Information (PII) detection.

The full mapping of original entity labels to the adjusted set is provided in the entity_mappings file.

All reported evaluation metrics are calculated after applying this mapping, ensuring they accurately reflect the model's performance in our target setup.

For our purposes, the entity list was changed. List of changes is described in entity_mappings file

Matching Schemes

The model was evaluated using nervaluate, with two matching schemes to reflect both practical and strict performance scenarios:

  • Exact Match:

    • An entity prediction is considered correct only if the predicted entity exactly matches both the entity boundaries (start and end tokens) and the entity label.
    • This approach penalizes both boundary errors and misclassifications, providing a conservative estimate of model performance β€” useful for applications where exact localization is critical (e.g., redaction).
  • Entity Type Match:

    • An entity is counted as correct if there is any overlap between the predicted span and the ground truth span, and the predicted label matches the true label.
    • This scheme is more permissive and rewards partial matches, suitable for exploratory analysis and scenarios where partial detection is still valuable (e.g., highlighting sensitive spans).

If you want more about evaluation methodology take a look an article about nervaluate


Metrics

Entity Type

model language precision recall f1-score
prajjwal1-bert-small_1_onnx English 0.90 0.88 0.89
prajjwal1-bert-small_1_onnx_quant English 0.90 0.88 0.89

Exact Match

model language precision recall f1-score
prajjwal1-bert-small_1_onnx English 0.82 0.80 0.81
prajjwal1-bert-small_1_onnx_quant English 0.82 0.80 0.81

Entity Type (entity)

model language entity precision recall f1-score
prajjwal1-bert-small_1_onnx English company 0.84 0.84 0.84
prajjwal1-bert-small_1_onnx English date_time 0.90 0.91 0.91
prajjwal1-bert-small_1_onnx English email 0.98 0.96 0.97
prajjwal1-bert-small_1_onnx English misc 0.90 0.87 0.88
prajjwal1-bert-small_1_onnx English name 0.94 0.85 0.89
prajjwal1-bert-small_1_onnx English phone_number 0.89 0.94 0.91
prajjwal1-bert-small_1_onnx English street_address 0.90 0.86 0.88
prajjwal1-bert-small_1_onnx_quant English company 0.84 0.85 0.84
prajjwal1-bert-small_1_onnx_quant English date_time 0.90 0.91 0.91
prajjwal1-bert-small_1_onnx_quant English email 0.98 0.96 0.97
prajjwal1-bert-small_1_onnx_quant English misc 0.89 0.86 0.88
prajjwal1-bert-small_1_onnx_quant English name 0.94 0.85 0.89
prajjwal1-bert-small_1_onnx_quant English phone_number 0.88 0.94 0.91
prajjwal1-bert-small_1_onnx_quant English street_address 0.90 0.86 0.88

Exact Match (entity)

model language entity precision recall f1-score
prajjwal1-bert-small_1_onnx English company 0.79 0.79 0.79
prajjwal1-bert-small_1_onnx English date_time 0.75 0.76 0.76
prajjwal1-bert-small_1_onnx English email 0.93 0.91 0.92
prajjwal1-bert-small_1_onnx English misc 0.83 0.80 0.81
prajjwal1-bert-small_1_onnx English name 0.89 0.81 0.85
prajjwal1-bert-small_1_onnx English phone_number 0.89 0.94 0.91
prajjwal1-bert-small_1_onnx English street_address 0.86 0.82 0.84
prajjwal1-bert-small_1_onnx_quant English company 0.78 0.79 0.78
prajjwal1-bert-small_1_onnx_quant English date_time 0.75 0.76 0.76
prajjwal1-bert-small_1_onnx_quant English email 0.93 0.91 0.92
prajjwal1-bert-small_1_onnx_quant English misc 0.82 0.79 0.81
prajjwal1-bert-small_1_onnx_quant English name 0.89 0.81 0.85
prajjwal1-bert-small_1_onnx_quant English phone_number 0.88 0.93 0.91
prajjwal1-bert-small_1_onnx_quant English street_address 0.86 0.82 0.84

Citation

@misc{bhargava2021generalization,
      title={Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics}, 
      author={Prajjwal Bhargava and Aleksandr Drozd and Anna Rogers},
      year={2021},
      eprint={2110.01518},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{DBLP:journals/corr/abs-1908-08962,
  author    = {Iulia Turc and
               Ming{-}Wei Chang and
               Kenton Lee and
               Kristina Toutanova},
  title     = {Well-Read Students Learn Better: The Impact of Student Initialization
               on Knowledge Distillation},
  journal   = {CoRR},
  volume    = {abs/1908.08962},
  year      = {2019},
  url       = {http://arxiv.org/abs/1908.08962},
  eprinttype = {arXiv},
  eprint    = {1908.08962},
  timestamp = {Thu, 29 Aug 2019 16:32:34 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1908-08962.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Downloads last month
30
Safetensors
Model size
28.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gravitee-io/bert-small-pii-detection

Quantized
(2)
this model

Dataset used to train gravitee-io/bert-small-pii-detection