SINA-BERT: A Pre-trained Language Model for Analysis of Medical Texts in Persian

SINA-BERT is the first Persian medical language model pre-trained on BERT (Devlin et al.,2018). SINA-BERT utilizes pre-training on a large-scale corpus of medical contents including formal and informal texts collected from a variety of online resources in order to improve the performance on health-care related tasks.

Model Evaluation

SINA-BERT can be used for any Persian medical representative task. In our paper we have examined the followings:

categorization of medical questions,
medical sentiment analysis,
and medical question retrieval.

For each task, we have developed Persian annotated data sets, and learnt a representation for the data of each task especially complex and long medical questions. With the same architecture being used across tasks, SINA-BERT outperforms BERT-based models that were previously made available in the Persian language.

To read about the datasets and results, please refer to SINA-BERT paper: arXiv:2104.07613v1

Developed by: HooshAfzar Salamat Team
Language(s) (NLP): Persian
Finetuned from model: ParsBert

Model Sources [optional]

Repository: GitHub
Paper [optional]: arXive paper

How to use

from transformers import AutoConfig, AutoTokenizer, AutoModel

config = AutoConfig.from_pretrained("hooshafzar/SINA-BERT")
tokenizer = AutoTokenizer.from_pretrained("hooshafzar/SINA-BERT")
model = AutoModel.from_pretrained("hooshafzar/SINA-BERT")

Citation

@article{taghizadeh2021sina,
  title={SINA-BERT: a pre-trained language model for analysis of medical texts in Persian},
  author={Taghizadeh, Nasrin and Doostmohammadi, Ehsan and Seifossadat, Elham and Rabiee, Hamid R and Tahaei, Maedeh S},
  journal={arXiv preprint arXiv:2104.07613},
  year={2021}
}

hooshafzar
/

SINA-BERT

SINA-BERT: A Pre-trained Language Model for Analysis of Medical Texts in Persian

Model Evaluation

Model Sources [optional]

How to use

Citation

Model tree for hooshafzar/SINA-BERT