metadata

base_model: unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2_vl
  - trl
  - VisionQA
license: apache-2.0
language:
  - en
datasets:
  - hamzamooraj99/PMC-VQA-1

MedQA-Qwen-2B-LoRA16

Fine-tuned Qwen2-VL-2B on PMC-VQA (version 1) for Medical Visual Question Answering. Uses LoRA (rank=16) to adapt vision and language layers.

Model Details

Base Model: Qwen2-VL-2B
Fine-tuned on: PMC-VQA (Compounded images)
Fine-tuning Method: LoRA (Rank=16, Alpha=16, Dropout=0)
Layers Updated: Vision, Attention, Language, MLP Modules
Optimiser: AdamW (8-bit)
Batch Size: 5 per device (Gradient Accumulation = 2)
Learning Rate: 2e-4
Training Time: 572.73 minutes (~9.5 hours)
Peak GPU Usage: 8.0GB (RTX 4080 Super)

Dataset

PMC-VQA-1 # Replace with the actual dataset name

226,948 samples split into train, validation and test sets

Data Fields include:

Feature	Description
`Figure_path`:	The filename of the corresponding image (e.g., "PMC_1.jpg").
`Question`:	The medical question related to the image.
`Answer`:	The correct answer to the question.
`Choice A`:	Option A for the multiple-choice question.
`Choice B`:	Option B for the multiple-choice question.
`Choice C`:	Option C for the multiple-choice question.
`Choice D`:	Option D for the multiple-choice question.
`Answer_label`:	The index label of the correct answer choice (A, B, C, D).
`image`:	The actual image data, stored as a PIL Image object.

Preprocessing:
- Images resized: max_pixels = 256x256, min_pixels = 224x224
- No additional augmentation
Dataset created from PMC-VQA dataset https://huggingface.co/datasets/xmcmic/PMC-VQA.

Training Performance

Step	Training Loss	Validation Loss
9000	0.877100	0.778581
10000	0.742300	0.774723
11000	0.749400	0.772927
12000	0.857600	0.769148
13000	0.786700	0.766358
14000	0.717500	0.765929
15000	0.737700	0.764269

✅ Stable training with minimal overfitting observed.
❌ Training was resumed from a checkpoint due to a hardware timeout, which might have affected overall training efficiency.

Uploaded model

Developed by: hamzamooraj99
License: apache-2.0
Finetuned from model : unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit

This qwen2_vl model was trained 2x faster with Unsloth and Huggingface's TRL library.