distilbert-medication-ner

This model is a fine-tuned version of distilbert-base-cased on synthetically generated medication data by Synthea.

More details on how this model was trained can be found on GitHub.

Model Description

A fine-tuned NER model developed to handle 5 specific entities (i.e. DRUG, DOSAGE, ROUTE, BRAND, QUANTITY) when processing medication strings such as:

  • Ibuprofen 100 MG Oral Tablet
  • 1 ML medroxyprogesterone acetate 150 MG/ML Injection
  • Acetaminophen 325 MG / Oxycodone Hydrochloride 10 MG Oral Tablet [Percocet]

The model was trained and evaluated on limited manually annotated datasets (i.e. train_n_samples=309, eval_n_samples=335), achieved the following evaluation metrics:

  • Precision: 0.998
  • Recall: 0.983
  • F1: 0.991

Usage

  1. Load model:
from transformers import AutoTokenizer, AutoModelForTokenClassification

model_name = "jackleejm/distilbert-medication-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
  1. Setup a pipeline and run inferences:
from transformers import pipeline

ner_pipeline = pipeline(
  task="token-classification",
  model=model,
  tokenizer=tokenizer,
  aggregation_strategy="simple",
  device_map="auto",
)

input = ["Acetaminophen 325 MG Oral Tablet"]
results = ner_pipeline(input)

print(results)

# Outputs
[
  [
    {
      "word": "Acetaminophen",
      "score": np.float32(0.99948627),
      "entity_group": "DRUG",
      "start": 0,
      "end": 13
    },
    {
      "word": "325 MG",
      "score": np.float32(0.99882394),
      "entity_group": "DOSAGE",
      "start": 14,
      "end": 20
    },
    {
      "word": "Oral Tablet",
      "score": np.float32(0.9994621),
      "entity_group": "ROUTE",
      "start": 21,
      "end": 32
    }
  ]
]

Training Procedure

Training Hyperparameters

  • learning_rate: 2e-5
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 20
  • weight_decay: 0.01
  • evaluation_strategy: "steps"
  • eval_steps: 50
  • load_best_model_at_end: True
  • metric_for_best_model: "f1"

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
20
Safetensors
Model size
65.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support