π§ ClinicalBERT-MS-Autoimmune-Neuro
A fine-tuned version of emilyalsentzer/Bio_ClinicalBERT
for detecting autoimmune neurological disease signals from clinical text notes.
Maintainer: Vahid Mahmoudian
Repository:vhdm/clinicalbert-ms-autoimmune-neuro
Status: Research / Proof-of-Concept β not for standalone clinical use
π Model Summary
| Property | Value |
|---|---|
| Base model | emilyalsentzer/Bio_ClinicalBERT |
| Task | Binary text classification (autoimmune-neurological vs non-autoimmune) |
| Language | English clinical notes |
| Domain | Neurology / Autoimmune disorders |
| Dataset | Internal βMS-autoimmuneβ corpus (split into train, valid, test) |
| Sequence length | 512 tokens per chunk |
| Hardware | NVIDIA H100 |
| Mixed precision | bf16 |
| Trainer seed | 42 |
| Epochs | 4 |
| Learning rate | 2 Γ 10β»β΅ |
| Optimizer | adamw_torch |
| Batch sizes | train = 24, eval = 48 |
| Warmup ratio | 0.1 |
| Best metric | recall |
βοΈ Training Log (chunk-level)
Below metrics are auto-generated by Hugging Face Trainer
using the raw validation set (before note aggregation, calibration, or threshold tuning).
| Step | Train Loss | Val Loss | Accuracy | Precision | Recall | F1 | AUC |
|---|---|---|---|---|---|---|---|
| 200 | 0.655 | 0.627 | 0.660 | 0.800 | 0.0377 | 0.0720 | 0.6259 |
| 800 | 0.331 | 0.373 | 0.840 | 0.882 | 0.628 | 0.733 | 0.8856 |
| 1600 | 0.275 | 0.360 | 0.851 | 0.854 | 0.691 | 0.764 | 0.9040 |
| 2400 | 0.291 | 0.333 | 0.858 | 0.878 | 0.689 | 0.772 | 0.9121 |
| 3400 | 0.220 | 0.358 | 0.861 | 0.851 | 0.731 | 0.786 | 0.9169 |
| 4600 | 0.169 | 0.432 | 0.856 | 0.828 | 0.744 | 0.784 | 0.9172 |
Final Trainer metrics (chunk-level):
These are raw chunk-level metrics for monitoring during training β not the final evaluation used for deployment.
π§© Note-Level Aggregated Evaluation (final tuned results)
After post-processing with:
- Aggregation:
logit_topk (k = 3) - Calibration: temperature scaling (T = 1.372)
- Threshold: tuned on validation (FΞ² = 1.5 β thr β 1.000)
- Inference logic: per-note probability = mean(logit(top-k chunks))
Validation (n = 493)
| Metric | Value |
|---|---|
| Precision | 0.9118 |
| Recall | 0.9394 |
| F1 | 0.9254 |
| Accuracy | 0.9493 |
| ROC-AUC | 0.9688 |
Test (n = 493)
| Metric | Value |
|---|---|
| Precision | 0.9618 |
| Recall | 0.9207 |
| F1 | 0.9408 |
| Accuracy | 0.9615 |
| ROC-AUC | 0.9786 |
β
Final configuration:Aggregation = logit_topk(k=3)βTemperature = 1.372βThreshold β 1.0
π§ Inference Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, numpy as np
repo = "vhdm/clinicalbert-ms-autoimmune-neuro"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
texts = ["Patient reports numbness in lower limbs...", "MRI shows demyelination consistent with MS."]
inputs = tok(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).numpy()[:, 1]
# logit_topk aggregation (k=3)
probs = np.clip(probs, 1e-6, 1-1e-6)
logits_ = np.log(probs) - np.log(1-probs)
k = 3
idx = np.argsort(logits_)[-min(k, len(logits_)):]
mean_logit = logits_[idx].mean()
note_score = 1.0 / (1.0 + np.exp(-mean_logit))
T = 1.372 # temperature
note_score_cal = 1.0 / (1.0 + np.exp(-mean_logit / T))
thr = 1.0 # tuned threshold
pred = int(note_score_cal >= thr)
print({"score": note_score_cal, "prediction": pred})
π§ͺ Reproducibility
TrainingArguments(
output_dir="./runs/clinicalbert_ms",
learning_rate=2e-5,
per_device_train_batch_size=24,
per_device_eval_batch_size=48,
num_train_epochs=5,
weight_decay=0.01,
bf16=True,
optim="adamw_torch",
warmup_ratio=0.1,
seed=42,
evaluation_strategy="steps",
save_strategy="steps",
logging_steps=50,
eval_steps=200,
save_steps=200,
save_total_limit=3,
load_best_model_at_end=True,
metric_for_best_model="recall",
greater_is_better=True,
)
- Downloads last month
- 27
Model tree for vhdm/clinicalbert-ms-autoimmune-neuro
Base model
emilyalsentzer/Bio_ClinicalBERTEvaluation results
- Precision on Internal MS-Autoimmune Corpustest set self-reported0.962
- Recall on Internal MS-Autoimmune Corpustest set self-reported0.921
- F1 on Internal MS-Autoimmune Corpustest set self-reported0.941
- ROC-AUC on Internal MS-Autoimmune Corpustest set self-reported0.979
- Accuracy on Internal MS-Autoimmune Corpustest set self-reported0.962