Model Card for Cardioner.nl 128
This a CLTL/MedRoBERTa.nl base model with finetuned heads for span classification. For this model we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions over sequences. This specific model is trained on a batch of about 500 span-labeled documents.
This is version was trained with context windows of 128 tokens. For the chunking we used a paragraph-based splitter.
The training was performed with 10 fold CV, with SLERP (chained) averaging of the best epochs per fold.
NOTE: the base weights are exactly the same as for the original MedRoBERTa.nl, we added an expressive head with about 1.4 million parameters that was trained on the CardioCCC NER dataset.
Expected input and output
The input should be a string with Dutch clinical text related to cardiology.
CardioNER.nl_128 is a multiclass span classification model. The classes that can be predicted are
- procedure,
- medication,
- disease,
- symptom.
Extracting span classification from CardioNER.nl_128xtokenWindow
The following script converts a string of <128 tokens to a list of span predictions.
from transformers import pipeline
le_pipe = pipeline('ner',
model=model,
tokenizer=model, aggregation_strategy="simple",
trust_remote_code=True,
device=-1)
named_ents = le_pipe(SOME_TEXT)
To process a string of arbitrary length you can split the string into sentences or paragraphs using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe. You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
named_ents = le_pipe(SOME_TEXT, stride=256)
Data description
CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
Acknowledgement
This is part of the DT4H project.
Doi and reference
For more details about training/eval and other scripts, see CardioNER github repo. and for more information on the background, see Datatools4Heart Huggingface/Website
- Downloads last month
- -
Model tree for UMCU/MedRoBERTa.nl_CardioNER_headOnly
Base model
CLTL/MedRoBERTa.nl