Multilingual Symptom Extraction (English + Amharic)

Model Description

This model is a fine-tuned XLM-R-base model for extracting symptoms from patient generated texts in English and Amharic. It was developed as part of an MSc thesis at the Technical University of Munich, aiming to support AI-powered symptom extraction platforms in multilingual healthcare settings.

The model performs named entity recognition (NER) to identify symptom mentions in unstructured texts using the BIO tagging scheme.

Intended Uses & Limitations

Intended uses:

Automatic symptom extraction from patient-generated texts in English and Amharic.
Research in multilingual biomedical NLP.
Integration into AI diagnostic platforms for low-resource languages.
Limitations:
Only trained on the datasets described in the thesis; performance on other domains may vary.
Not validated for real-world clinical deployment without further testing and ethical approval.
Training Data
Unlabeled English, Amharic, and Tigrinya corpora collected from diverse sources for domain adaptation.
A combination of publicly available datasets and synthetically generated data, and then labeled, was used for fine-tuning the model.
Preprocessed with tokenization, normalization, and Docanno BIO tagging.

Evaluation

Metrics: Precision, Recall, F1-score for symptom extraction.
Results and detailed evaluation are described in the MSc thesis:
Negash Desalegn. (2025). Bridging the Linguistic Gap in Healthcare: A Multilingual AI Approach for Symptom Extraction in Low-Resource Languages. Technical University of Munich.

Downloads last month: 4

Safetensors

Model size

277M params

Tensor type

F32

Model tree for kechemale/eng-am-symptom-ner

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3429)

this model

kechemale
/

eng-am-symptom-ner

Multilingual Symptom Extraction (English + Amharic)

Model Description

Intended Uses & Limitations

Training Data

Evaluation

Model tree for kechemale/eng-am-symptom-ner

Dataset used to train kechemale/eng-am-symptom-ner