metadata
base_model: unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2_vl
- trl
- VisionQA
license: apache-2.0
language:
- en
datasets:
- hamzamooraj99/PMC-VQA-1
MedQA-Qwen-2B-LoRA16
Fine-tuned Qwen2-VL-2B on PMC-VQA (version 1) for Medical Visual Question Answering. Uses LoRA (rank=16) to adapt vision and language layers.
Model Details
- Base Model: Qwen2-VL-2B
- Fine-tuned on: PMC-VQA (Compounded images)
- Fine-tuning Method: LoRA (Rank=16, Alpha=16, Dropout=0)
- Layers Updated: Vision, Attention, Language, MLP Modules
- Optimiser: AdamW (8-bit)
- Batch Size: 5 per device (Gradient Accumulation = 2)
- Learning Rate: 2e-4
- Training Time: 572.73 minutes (~9.5 hours)
- Peak GPU Usage: 8.0GB (RTX 4080 Super)
Dataset
PMC-VQA-1 # Replace with the actual dataset name
- 226,948 samples split into train, validation and test sets
- Data Fields include:
Feature Description Figure_path
:The filename of the corresponding image (e.g., "PMC_1.jpg"). Question
:The medical question related to the image. Answer
:The correct answer to the question. Choice A
:Option A for the multiple-choice question. Choice B
:Option B for the multiple-choice question. Choice C
:Option C for the multiple-choice question. Choice D
:Option D for the multiple-choice question. Answer_label
:The index label of the correct answer choice (A, B, C, D). image
:The actual image data, stored as a PIL Image object. - Preprocessing:
- Images resized:
max_pixels = 256x256
,min_pixels = 224x224
- No additional augmentation
- Images resized:
- Dataset created from PMC-VQA dataset https://huggingface.co/datasets/xmcmic/PMC-VQA.
Training Performance
Step | Training Loss | Validation Loss |
---|---|---|
9000 | 0.877100 | 0.778581 |
10000 | 0.742300 | 0.774723 |
11000 | 0.749400 | 0.772927 |
12000 | 0.857600 | 0.769148 |
13000 | 0.786700 | 0.766358 |
14000 | 0.717500 | 0.765929 |
15000 | 0.737700 | 0.764269 |
✅ Stable training with minimal overfitting observed.
❌ Training was resumed from a checkpoint due to a hardware timeout, which might have affected overall training efficiency.
Uploaded model
- Developed by: hamzamooraj99
- License: apache-2.0
- Finetuned from model : unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit
This qwen2_vl model was trained 2x faster with Unsloth and Huggingface's TRL library.