Sakinah-AI: Optimized AraBERT for Arabic Mental Health Question Classification

Sakinah-AI Project Banner

This repository contains the official fine-tuned model Sakinah-AI-AraBERT-Optimized, one of our submissions to the MentalQA 2025 Shared Task (Track 1).

By: Fatimah Emad Elden & Mumina Abukar

Cairo University & The University of South Wales

Paper Code HuggingFace Collection License


πŸ“– Model Description

This model is a fine-tuned version of aubmindlab/bert-base-arabertv2 for multi-label classification of Arabic questions related to mental health. It was trained on the AraHealthQA dataset.

Our approach involved a comprehensive hyperparameter search using the Optuna framework to find the optimal configuration. To address class imbalance, the model was trained using a custom Focal Loss function. This optimized fine-tuning approach significantly outperformed its k-fold ensemble counterpart. On the official blind test set, this model achieved a Weighted F1-score of 0.543.

The model predicts one or more of the following labels for a given question:

  • A: Diagnosis (Interpreting symptoms)
  • B: Treatment (Seeking therapies or medications)
  • C: Anatomy and Physiology (Basic medical knowledge)
  • D: Epidemiology (Course, prognosis, causes of diseases)
  • E: Healthy Lifestyle (Diet, exercise, mood control)
  • F: Provider Choices (Recommendations for doctors)
  • Z: Other (Does not fit other categories)

πŸš€ How to Use

You can use this model directly with the transformers library pipeline for text-classification.

from transformers import pipeline

# Load the classification pipeline
classifier = pipeline(
    "text-classification",
    model="FatimahEmadEldin/Sakinah-AI-AraBERT-Optimized",
    return_all_scores=True # Set to True for multi-label output
)

# Example question in Arabic
question = "Ω…Ψ§ Ω‡ΩŠ Ψ£ΨΉΨ±Ψ§ΨΆ Ψ§Ω„Ψ§ΩƒΨͺΨ¦Ψ§Ψ¨ ΩˆΩƒΩŠΩ ΩŠΩ…ΩƒΩ† ΨΉΩ„Ψ§Ψ¬Ω‡ΨŸ"
# (Translation: "What are the symptoms of depression and how can it be treated?")

results = classifier(question)

# --- Post-processing to get final labels ---
# The optimal threshold must be determined from your Optuna study results.
# The evaluation script uses a placeholder of 0.45.
# Replace with the actual best_params['base_threshold'] value.
threshold = 0.45 
predicted_labels = [item['label'] for item in results[0] if item['score'] > threshold]

print(f"Question: {question}")
# Expected output for this example would likely include 'Diagnosis' and 'Treatment'
print(f"Predicted Labels: {predicted_labels}")
# Expected: ['A', 'B']

βš™οΈ Training Procedure

This model was fine-tuned using a rigorous hyperparameter optimization process.

Hyperparameters

The best hyperparameters were found by Optuna during the training process (arabert_optmized.py). You will need to retrieve these values from the output of your Optuna study (study.best_params).

Of course! Here is the information formatted into Markdown tables.

Optimization Results

Metric Value
Best trial F1 Score 0.6307

Best Hyperparameters Found

Hyperparameter Value
learning_rate 5.273957732715589e-05
num_train_epochs 13
weight_decay 0.04131058607286182
focal_alpha 0.9702303056621574
focal_gamma 1.39543909126709
base_threshold 0.20408644287720523

Frameworks

  • PyTorch
  • Hugging Face Transformers
  • Optuna

πŸ“Š Evaluation Results

The model was evaluated on the blind test set provided by the MentalQA organizers.

Final Test Set Scores

Metric Score
Weighted F1-Score 0.543

Per-Label Performance (Test Set)

Note: The following is a placeholder. To generate the actual report, run the arabert_evaluate.py script with your final model and the official test data.

--- Per-Label Performance (Test Set) ---
              precision    recall  f1-score   support

           A       0.65      0.81      0.72        84
           B       0.60      0.75      0.67        85
           C       0.00      0.00      0.00        10
           D       0.37      0.21      0.26        34
           E       0.41      0.37      0.39        38
           F       0.00      0.00      0.00         6
           Z       0.00      0.00      0.00         3

   micro avg       0.58      0.59      0.58       260
   macro avg       0.29      0.31      0.29       260
weighted avg       0.51      0.59      0.54       260
 samples avg       0.65      0.65      0.60       260

πŸ“œ Citation

If you use our work, please cite our paper:

@inproceedings{elden2025sakinahai,
    title={{Sakinah-AI at MentalQA: A Comparative Study of Few-Shot, Optimized, and Ensemble Methods for Arabic Mental Health Question Classification}},
    author={Elden, Fatimah Emad and Abukar, Mumina},
    year={2025},
    booktitle={Proceedings of the MentalQA 2025 Shared Task},
    eprint={25XX.XXXXX},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
11
Safetensors
Model size
135M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FatimahEmadEldin/Sakinah-AI-AraBERT-Optimized

Finetuned
(63)
this model

Collection including FatimahEmadEldin/Sakinah-AI-AraBERT-Optimized