hamzamooraj99's picture
Update README.md
88a9abc verified
metadata
base_model: unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2_vl
  - trl
  - VisionQA
license: apache-2.0
language:
  - en
datasets:
  - hamzamooraj99/PMC-VQA-1

MedQA-Qwen-2B-LoRA16

Fine-tuned Qwen2-VL-2B on PMC-VQA (version 1) for Medical Visual Question Answering. Uses LoRA (rank=16) to adapt vision and language layers.

Hugging Face Model

Hugging Face Dataset


Model Details

  • Base Model: Qwen2-VL-2B
  • Fine-tuned on: PMC-VQA (Compounded images)
  • Fine-tuning Method: LoRA (Rank=16, Alpha=16, Dropout=0)
  • Layers Updated: Vision, Attention, Language, MLP Modules
  • Optimiser: AdamW (8-bit)
  • Batch Size: 5 per device (Gradient Accumulation = 2)
  • Learning Rate: 2e-4
  • Training Time: 572.73 minutes (~9.5 hours)
  • Peak GPU Usage: 8.0GB (RTX 4080 Super)

Dataset

PMC-VQA-1 # Replace with the actual dataset name

  • 226,948 samples split into train, validation and test sets
  • Data Fields include:
    Feature Description
    Figure_path: The filename of the corresponding image (e.g., "PMC_1.jpg").
    Question: The medical question related to the image.
    Answer: The correct answer to the question.
    Choice A: Option A for the multiple-choice question.
    Choice B: Option B for the multiple-choice question.
    Choice C: Option C for the multiple-choice question.
    Choice D: Option D for the multiple-choice question.
    Answer_label: The index label of the correct answer choice (A, B, C, D).
    image: The actual image data, stored as a PIL Image object.
  • Preprocessing:
    • Images resized: max_pixels = 256x256, min_pixels = 224x224
    • No additional augmentation
  • Dataset created from PMC-VQA dataset https://huggingface.co/datasets/xmcmic/PMC-VQA.

Training Performance

Step Training Loss Validation Loss
9000 0.877100 0.778581
10000 0.742300 0.774723
11000 0.749400 0.772927
12000 0.857600 0.769148
13000 0.786700 0.766358
14000 0.717500 0.765929
15000 0.737700 0.764269

Stable training with minimal overfitting observed.
Training was resumed from a checkpoint due to a hardware timeout, which might have affected overall training efficiency.


Uploaded model

  • Developed by: hamzamooraj99
  • License: apache-2.0
  • Finetuned from model : unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit

This qwen2_vl model was trained 2x faster with Unsloth and Huggingface's TRL library.