MADRS-BERT

MADRS-BERT is a fine-tuned bert-base-german-cased model that predicts depression severity scores (0–6) across individual items of the Montgomery-Åsberg Depression Rating Scale (MADRS). Each prediction is based on transcribed, structured clinician–patient interview segments.

Publication: https://doi.org/10.21203/rs.3.rs-6555767/v1
Example dataset: https://github.com/webersamantha/MADRS-BERT/data
Github Repo: The code for data curation, finetuning and evaluation is shared in the following github repo: https://github.com/webersamantha/MADRS-BERT

This model was developed to support standardized, scalable mental health assessments in both clinical and low-resource settings.

Model Details

Base model: bert-base-german-cased
Task: Ordinal regression (scores 0–6)
Language: German
Input: Text (dialogue segment grouped by MADRS topic)
Output: Predicted score for each MADRS item (rounded integer 0–6)
Training data: Mix of real and synthetic clinician–patient interviews (MADRS-structured)

Intended Use

This model is intended for research and development use. It is not a certified medical device. The goal is to:

Explore AI-assisted symptom severity assessment
Enable structured evaluation of individual MADRS items
Support clinicians or researchers working in psychiatry/mental health

🚀 How to Use

Preprocess Data File:

Please organize your data equivalent to the example data (synthetic data) with columns: Subject, Speaker, Transcription, Topic, Score.


import pandas as pd

def load_and_prepare_conversations(filepath):
    df = pd.read_excel(filepath)
    conversations = []

    for topic in df['Topic'].unique():
        topic_df = df[df['Topic'] == topic]
        if topic_df.empty: continue

        dialogue = "\n".join([
            f"{row['Speaker']}: {row['Transcription']}"
            for _, row in topic_df.iterrows()
            if pd.notnull(row['Transcription'])
        ])

        conversations.append((topic, dialogue))
    return conversations

Load model and tokenizer:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "webersamantha/MADRS-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval().to("cuda" if torch.cuda.is_available() else "cpu")

Predict on a full structured interview / Run inference:

Assume you have a conversation log like this:

def predict_madrs_scores(conversations, tokenizer, model):
    device = model.device
    predictions = {}
    
    for topic, dialogue in conversations:
        inputs = tokenizer(dialogue, truncation=True, padding="max_length", max_length=512, return_tensors="pt").to(device)
        with torch.no_grad():
            score = torch.round(model(**inputs).logits).clamp(0, 6).item()
        predictions[topic] = score

    return predictions

file_path = "example_interview.xlsx"
conversations = load_and_prepare_conversations(file_path)
scores = predict_madrs_scores(conversations, tokenizer, model)
print(scores)

Acknowledgements

Model trained and released by Samantha Weber within the framework of the Multicast Project on predicting and treating suicidality. Research conducted as part of efforts to improve AI-driven mental health tools. Thanks to all clinicians and collaborators who contributed to the annotated MADRS dataset.

Evaluation

The model was evaluated on a held-out clinical validation set and achieved strong performance under both strict and flexible scoring criteria (±1 deviation tolerance). See publication for full metrics.

Citation

If you use this model, please cite:

Weber, S. et al. (2025). "Using a Fine-tuned Large Language Model for Symptom-based Depression Evaluation" Preprint. https://doi.org/10.21203/rs.3.rs-6555767/v1

webesama
/

MADRS-BERT