TCMNER

Model description

TCMNER is a fine-tuned BERT model that is ready to use for Named Entity Recognition of Traditional Chinese Medicine and achieves state-of-the-art performance for the NER task. It has been trained to recognize six types of entities: prescription (方剂), herb (本草), source (来源), disease (病名), symptom (症状) and syndrome（证型）.

Specifically, this model is a TCMRoBERTa model, a fine-tuned model of RoBERTa for Traditional Chinese medicine, that was fine-tuned on the Chinese version of the Haiwei AI Lab's Named Entity Recognition dataset.

Currently, TCMRoBERTa is just a closed-source model for my own company and will be open-source in the future.

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Monor/TCMNER")
model = AutoModelForTokenClassification.from_pretrained("Monor/TCMNER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "化滞汤,出处：《证治汇补》卷八。。组成：青皮20g，陈皮20g，厚朴20g，枳实20g，黄芩20g，黄连20g，当归20g，芍药20g，木香5g，槟榔8g，滑石3g，甘草4g。。主治：下痢因于食积气滞者。"

ner_results = nlp(example)
print(ner_results)

Training data

This model was fine-tuned on MY DATASET.

Abbreviation	Description
O	Outside of a named entity
B-方剂	Beginning of a prescription entity right after another prescription entity
I-方剂	Prescription entity
B-本草	Beginning of a herb entity right after another herb entity
I-本草	Herb entity
B-来源	Beginning of a source of prescription right after another source of prescription
I-来源	Source entity
B-病名	Beginning of a disease's name right after another disease's name
I-病名	Disease's name
B-症状	Beginning of a symptom right after another symptom
I-症状	Symptom
B-证型	Beginning of a syndrome right after another syndrome
I-证型	Syndrome

Eval results

Notices

The model is commercially available for free.
I am not going to write a paper about this model, if you use any details in your paper, please mention it, thanks.

Bonus

All of our TCM domain models will be open-sourced soon, including:

A series of pre-trained models
Named entity recognition for TCM
Text localization in ancient images
OCR for ancient images

And so on

Monor
/

hwtcmner