chhatramani/Gemma3n_Radiology_v1

Fine-Tuned Gemma 3N for Medical VQA on ROCOv2

This repository hosts chhatramani/Gemma3n_Radiology_v1, a vision-language model (VLM) fine-tuned on the ROCOv2 radiography dataset for Medical Visual Question Answering (VQA). This model leverages the latest Gemma 3N architecture from Google's Gemmaverse, with both its vision and language components fine-tuned for improved performance in the medical domain.

Please Note: This model was developed as an experimental project for research and educational purposes only. It is not intended for clinical use or to provide medical advice. Always consult with qualified medical professionals for diagnosis and treatment.

Model Description

chhatramani/Gemma3n_Radiology_v1 is built upon the powerful unsloth/gemma-3n-E2B-it base model. It has undergone parameter-efficient fine-tuning (PEFT) using LoRA adapters, specifically targeting both the vision and language layers, including attention and MLP modules. This approach allows for efficient training by updating only a small percentage of the model's parameters while achieving significant performance gains.

The fine-tuning process aimed to transform a general-purpose VLM into a specialized tool for medical professionals, capable of analyzing medical images (X-rays, CT scans, ultrasounds) and understanding expert-written captions describing medical conditions and diseases.

Training Details

The model was fine-tuned using the following key technologies and methodologies:

Base Model: unsloth/gemma-3n-E2B-it
Fine-tuning Framework: UnSloth, HuggingFace, TRL
PEFT Method: LoRA (Low-Rank Adaptation)
- finetune_vision_layers: True
- finetune_language_layers: True
- finetune_attention_modules: True
- finetune_mlp_modules: True
- r: 16
- lora_alpha: 16
- lora_dropout: 0
- bias: "none"
- random_state: 3407
- use_rslora: False
- loftq_config: None
- target_modules: "all-linear"
- modules_to_save: ["lm_head", "embed_tokens"]
Dataset: A sampled version of the ROCO radiography dataset. The full dataset is available here. The dataset consists of medical images (X-rays, CT scans, ultrasounds) paired with expert-written captions.

Installation (for local reproduction/usage)

To use or reproduce the training of this model, you will need to install the necessary libraries. It is recommended to use UnSloth for optimized performance.

# For Colab notebooks (or similar environments)
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
!pip install --no-deps unsloth

# Install latest transformers and timm for Gemma 3N compatibility
!pip install --no-deps transformers==4.53.1
!pip install --no-deps --upgrade timm

Usage (Inference)
To use this model for inference, you can load it directly from Hugging Face Transformers.

from unsloth import FastVisionModel
from transformers import AutoProcessor
import torch
from PIL import Image

# Load the model and processor
model, processor = FastVisionModel.from_pretrained(
    "chhatramani/Gemma3n_Radiology_v1",
    load_in_4bit = True, # Use 4bit for inference to reduce memory use
)

# Example Usage:
# You can replace this with your own medical image and question
image_path = "path/to/your/medical_image.jpg" # Replace with an actual image path
image = Image.open(image_path).convert("RGB")

# Prepare inputs
prompt = "What medical condition is shown in this image?"
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda")

# Generate response
outputs = model.generate(**inputs, max_new_tokens=200)

# Decode and print the output
generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

Dataset Information
The ROCOv2 (Radiology Objects in Context) dataset is a comprehensive collection of radiology images and their corresponding expert-written captions. The sampled version used for this fine-tuning, unsloth/Radiology_mini, provides a subset suitable for efficient experimentation and training.

Dataset Features:

image: The medical image (X-ray, CT scan, ultrasound).

image_id: Unique identifier for the image.

caption: Expert-written description of the medical image.

cui: Concept Unique Identifier (from UMLS Metathesaurus), providing standardized medical terminology.

chhatramani
/

Gemma3n_Radiology_v1

chhatramani/Gemma3n_Radiology_v1

Fine-Tuned Gemma 3N for Medical VQA on ROCOv2

Model Description

Training Details

Installation (for local reproduction/usage)

Model tree for chhatramani/Gemma3n_Radiology_v1

Dataset used to train chhatramani/Gemma3n_Radiology_v1