distilbert-medication-ner
This model is a fine-tuned version of distilbert-base-cased on synthetically generated medication data by Synthea.
More details on how this model was trained can be found on GitHub.
Model Description
A fine-tuned NER model developed to handle 5 specific entities (i.e. DRUG, DOSAGE, ROUTE, BRAND, QUANTITY) when processing medication strings such as:
- Ibuprofen 100 MG Oral Tablet
- 1 ML medroxyprogesterone acetate 150 MG/ML Injection
- Acetaminophen 325 MG / Oxycodone Hydrochloride 10 MG Oral Tablet [Percocet]
The model was trained and evaluated on limited manually annotated datasets (i.e. train_n_samples=309, eval_n_samples=335), achieved the following evaluation metrics:
- Precision: 0.998
- Recall: 0.983
- F1: 0.991
Usage
- Load model:
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_name = "jackleejm/distilbert-medication-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
- Setup a pipeline and run inferences:
from transformers import pipeline
ner_pipeline = pipeline(
task="token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple",
device_map="auto",
)
input = ["Acetaminophen 325 MG Oral Tablet"]
results = ner_pipeline(input)
print(results)
# Outputs
[
[
{
"word": "Acetaminophen",
"score": np.float32(0.99948627),
"entity_group": "DRUG",
"start": 0,
"end": 13
},
{
"word": "325 MG",
"score": np.float32(0.99882394),
"entity_group": "DOSAGE",
"start": 14,
"end": 20
},
{
"word": "Oral Tablet",
"score": np.float32(0.9994621),
"entity_group": "ROUTE",
"start": 21,
"end": 32
}
]
]
Training Procedure
Training Hyperparameters
- learning_rate: 2e-5
- per_device_train_batch_size: 16
- per_device_eval_batch_size: 16
- num_train_epochs: 20
- weight_decay: 0.01
- evaluation_strategy: "steps"
- eval_steps: 50
- load_best_model_at_end: True
- metric_for_best_model: "f1"
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0
- Datasets 3.3.2
- Tokenizers 0.21.0
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support