You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

RadioCare

image

PLEASE REQUEST ACCESS WITH YOUR MIMIC-CXR ACCESS RIGHT TO APPROVE

About

This is the model checkpoint for BLIP ("Salesforce/blip-image-captioning-large") finetuned on the MIMIC-CXR dataset and generates accurate radiology reports given an chest X-ray and a clinical indication (e.g. 'eval for pneumonia'). Training and inference code is provided in GitHub repository

Code

from PIL import Image
from transformers import BlipForConditionalGeneration, BlipProcessor

# Load model
processor = BlipProcessor.from_pretrained("adibvafa/BLIP-MIMIC-CXR")
model = BlipForConditionalGeneration.from_pretrained("adibvafa/BLIP-MIMIC-CXR")

# Load data
image = 'chest-x-ray.jpg'
prompt = 'final report\nexamination: chest (pa and lat)\nindication: ___f with chest pressure, uri sx, voice change.'

# Process inputs
inputs = processor(
    images=Image.open(image), 
    text=prompt,
    return_tensors="pt"
)

# Generate radiology report
output = model.generate(**inputs, max_length=512)
report = processor.decode(output[0], skip_special_tokens=True)
### report will be like follows:
final report
examination : chest ( pa and lat )
indication : ___f with chest pressure, uri sx, voice change.
comparison : none
findings : pa and lateral views of the chest provided. there is no focal consolidation, effusion, or pneumothorax. the cardiomediastinal silhouette is normal. imaged osseous structures are intact. no free air below the right hemidiaphragm is seen.
impression : no acute intrathoracic process.

Demo


Introduction

Radiocare aims to develop a cutting-edge image-to-text model that generates accurate radiology reports and diagnoses for chest X-ray images. By leveraging the BLIP and Vision Transformer architectures, Radiocare seeks to streamline the diagnostic process, enabling faster and more accurate identification of health issues. This project addresses the critical need for timely and precise radiological assessments, especially in rural areas with limited access to healthcare. Ultimately, Radiocare strives to improve patient outcomes and bridge the gap in healthcare accessibility across Canada.

Methods

Data

Radiocare utilizes data from the MIMIC-CXR database on PhysioNet, consisting of a large collection of chest X-ray images and associated radiology reports. This dataset provides a comprehensive source of medical images essential for training and evaluating the model.

Model Architecture

Radiocare employs the BLIP (Bootstrapped Language-Image Pre-training) model, which integrates the Vision Transformer (ViT) architecture with a text decoder. ViT processes images by dividing them into fixed-size patches, transforming these patches into high-dimensional vectors, and then embedding them into tokens. The self-attention mechanism in ViT captures global dependencies across patches, enhancing the model's understanding of the entire image. The text decoder translates these visual features into coherent radiology reports, enabling detailed and accurate diagnostics.

Results

Radiocare's model can assess a chest X-ray in approximately 3 seconds, providing doctors with a 99% faster diagnostic process. Key performance metrics include:

  • Bert Socre: 47.2
  • RadGraph: 26.1

Discussion

Radiocare represents a significant advancement in the field of medical diagnostics by leveraging state-of-the-art AI techniques to generate accurate and timely radiology reports from chest X-ray images. The integration of the BLIP model and Vision Transformer architecture enhances the diagnostic process, ensuring faster and more reliable results. By addressing the critical healthcare needs, especially in underserved rural areas, Radiocare has the potential to improve patient outcomes and bridge the gap in healthcare accessibility across Canada.

Team Information

Radiocare is part of the Spring 2024 cohort of Borealis AI's "Let's SOLVE It" program. The project team includes:

Repository Structure

The repository is organized as follows:

  • data_modules/: Contains data loading and preprocessing scripts.
  • evals/: Includes evaluation scripts and metrics calculation.
  • models/: Contains the different model architectures.
    • blip/: Final model implementation using BLIP and ViT.
    • cnn/: Convolutional neural network models.
    • vit/: Vision Transformer models.
  • utils/: Utility functions for the project.
  • slurm/: SLURM batch scripts for running jobs on a computing cluster.

Citation

If you use this work in your research, please cite:

@misc {adibvafa_fallahpour_2024,
    author       = { Fallahpour, Adibvafa and Srivastava, Archita and Dhillon, Mantaj and Liu, Grace },
    title        = { BLIP-MIMIC-CXR },
    year         = 2024,
    url          = { https://huggingface.co/adibvafa/BLIP-MIMIC-CXR },
    doi          = { 10.57967/hf/3207 },
    publisher    = { Hugging Face }
}
Downloads last month
10
Safetensors
Model size
470M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for adibvafa/BLIP-MIMIC-CXR

Finetuned
(5)
this model