Model Card for Cardioner.Nl 128 Centered

This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions over sequences. This specific model is trained on a batch of about 500 span-labeled documents.

This is version was trained with context windows of 128 tokens. For the chunking we used a span-centered splitter.

The training was performed with 10 fold CV, with weight averaging of the best epochs per fold.

Expected input and output

The input should be a string with Dutch cardio clinical text.

CardioNER.nl_128_centered is a muticlass span classification model. The classes that can be predicted are ['procedure,medication,disease,symptom'].

Extracting span classification from CardioNER.nl_128_centered

The following script converts a string of <512 tokens to a list of span predictions.

from transformers import pipeline

le_pipe = pipeline('ner',
                    model=model,
                    tokenizer=model, aggregation_strategy="simple",
                    device=-1)

named_ents = le_pipe(SOME_TEXT)

To process a string of arbitrary length you can split the string into sentences or paragraphs using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe. You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;

named_ents = le_pipe(SOME_TEXT, stride=256)

Data description

CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom

Acknowledgement

This is part of the DT4H project.

Doi and reference

For more details about training/eval and other scripts, see CardioNER github repo. and for more information on the background, see Datatools4Heart Huggingface/Website

Downloads last month
0
Safetensors
Model size
125M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for UMCU/cardioner.nl_128_centered

Base model

CLTL/MedRoBERTa.nl
Finetuned
(3)
this model