π©Ί lekhansh/modern-gliner-large-mednotes-deidentifier
π Model Summary
This is a fine-tuned version of knowledgator/modern-gliner-bi-large-v1.0
, tailored for fine-grained de-identification of real-world medical notes. It performs robustly on noisy, informal clinical text containing misspellings, abbreviations, and shorthand.
- Base model:
modern-gliner-bi-large-v1.0
- Fine-tuned on: 1,045 human-annotated medical records; validation on 386.
- F1 Score (baseline): 0.06
- F1 Score (post fine-tuning): 0.998
π·οΈ Entity Labels
The model recognizes the following sensitive entity types:
person
: HCW, patient or others. Does not capture salutations without names.address
: Geogrophical entity smaller than a state: cities, districts, taluqs, landmarks and locations within them.address state
: Indian states and UTs, including misspellings and abbreviations.address country
identification number
: Any numeric or alphanumeric identifierdates
: Only fully specified dates or date ranges that can compromise privacy. 'Jan 2020' is not caught but '03/01/2020' is.languages
groups
: Religion, tribal, political groups that can reveal identity.company
: Named places of occupation or health care facilities.
β οΈ Classification between address types (
address
,address state
,address country
) has sub-0.9 accuracy. Treat all address-like entities as sensitive in downstream processing.
π‘ Intended Use
Primary use: De-identification of free-text clinical notes in electronic health records (EHRs)
Not evaluated for:
- Zero-shot performance on other domains/tasks. Since recall ratio was 0, there will be catastrophic forgetting.
- Multilingual or domain transfer scenarios
β οΈ Limitations and Disclaimer
β οΈ DISCLAIMER: HUMAN OVERSIGHT REQUIRED
This model must not be used as a standalone anonymization tool.
Despite high validation performance, errors may still occurβespecially with rare entity forms, nested entities, or previously unseen abbreviations. Always include a human audit step before using de-identified records in research or production.
π License
Released under a MIT. See repository for terms.
π Citation
@misc{lekhansh2025deidentifier, title = {Fine-grained Medical Note De-identifier using Modern Gliner Large}, author = {Lekhansh}, year = {2025}, note = {Fine-tuned on real-world annotated records for clinical PII removal}, url = {https://huggingface.co/lekhansh/modern-gliner-large-mednotes-deidentifier} }
Model tree for Lekhansh/modern-gliner-large-mednotes-deidentifier
Base model
BAAI/bge-base-en-v1.5