T5-Small for Medical Report Labeling (Radiology NLP)

A fine-tuned t5-small model that extracts structured clinical labels from free-form radiologist diagnoses. This model transforms raw diagnostic text into 5 key medical labels, supporting downstream machine learning and analysis in medical imaging.

Trained on real-world anonymized radiology data in collaboration with AIIMS, New Delhi.


Problem Statement

Medical reports โ€” especially radiologist diagnoses โ€” are often unstructured, verbose, and inconsistent. This project addresses that problem by creating a model that can extract:

  • Abnormal/Normal
  • Pathologies Extracted
  • Midline Shift
  • Location & Brain Organ
  • Bleed Subcategory

Use Case

The output of this model can be paired with MRI scans to train supervised models for diagnosis, segmentation, or triaging. This can also help hospitals build structured EMRs from legacy reports.


Model Details

  • Base Model: t5-small
  • Architecture: Seq2Seq
  • Trained On: Internal AIIMS-labeled Excel dataset
  • Framework: Hugging Face Transformers

Evaluation

The test loss on an average is 0.03


Example Input/Output

Input (Prompt)

Extract info: Acute intracerebral hemorrhage with 4 mm midline shift and parietal lobe involvement.

How to use


from transformers import pipeline

pipe = pipeline("text2text-generation", model="gursmeep/t5-radiology-final")

prompt = "Extract info: Acute SDH with frontal lobe involvement and mild midline shift."
result = pipe(prompt, max_length=256, do_sample=False)

print(result[0]['generated_text'])

Dataset Background

  • Source: Excel sheet of annotated radiologist reports

  • Annotated via: GPT-4-assisted labeling

  • Origin: Data shared by company during internship project in collaboration with AIIMS

Training Setup

  • Trained on Colab GPU

  • Used Hugging Face Trainer and DataCollatorForSeq2Seq

  • 4 Epochs, Batch Size: 8

  • Input Format: "Extract info: {diagnosis text}"

Model Card Author

Developed by Gursmeep Kaur during a medical NLP internship project

Downloads last month
5
Safetensors
Model size
60.5M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support