Qwen/Qwen2-VL-2B-Instruct Fine-Tuned for Skin Disease Classification Model Overview This model is a fine-tuned version of the Qwen/Qwen2-VL-2B-Instruct model, specifically adapted for classifying skin diseases from images. It was trained on a dataset of skin disease images and corresponding labels. Model Details Model Name: Qwen2-VL-7B-Instruct-Skin-Disease Base Model: Qwen/Qwen2-VL-2B-Instruct Fine-Tuning Dataset: Skin Diseases Image Dataset (Kaggle) Training Parameters: Number of epochs: 3 Batch size: 4 (per device) Gradient accumulation steps: 8 Learning rate: 2e-4 Optimizer: AdamW Hardware: NVIDIA GPU (e.g., A100) Training Time: Approximately 20 hours How to Use Installation To use this model, you'll need to install the following dependencies: bash

pip install -U -q git+https://github.com/huggingface/transformers.git git+https://github.com/huggingface/trl.git datasets bitsandbytes peft qwen-vl-utils wandb accelerate pip install -q torch==2.4.1+cu121 torchvision==0.19.1+cu121 torchaudio==2.4.1+cu121 --extra-index-url https://download.pytorch.org/whl/cu121 pip install qwen-vl-utils Loading the Model Load the fine-tuned model and processor: python

from transformers import Qwen2VLForConditionalGeneration, Qwen2VLProcessor import torch

model_id = "your_username/qwen2-7b-instruct-trl-sft-skin-disease" model = Qwen2VLForConditionalGeneration.from_pretrained( model_id, device_map="auto", torch_dtype=torch.bfloat16, ) processor = Qwen2VLProcessor.from_pretrained(model_id) Generating Predictions To generate predictions for a skin disease image, use the following function: python

from qwen_vl_utils import process_vision_info

def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"): text_input = processor.apply_chat_template( sample[1:2], tokenize=False, add_generation_prompt=True ) image_inputs, _ = process_vision_info(sample) model_inputs = processor( text=[text_input], images=image_inputs, return_tensors="pt", ).to(device) generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens) trimmed_generated_ids = [out_ids[len(in_ids):] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)] output_text = processor.batch_decode( trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) return output_text[0]

Example usage

sample = { "role": "user", "content": [ { "type": "image", "image": "path/to/your/image.jpg", }, { "type": "text", "text": "What skin disease is shown in this image?", }, ], } output = generate_text_from_sample(model, processor, sample) print(output) System Message The model was trained with the following system message: text

You are a Vision Language Model specialized in identifying skin diseases. Your task is to analyze the provided image and classify the skin disease shown. Focus on delivering accurate, concise labels based on the visual information. Include this system message in your prompts for optimal performance. Performance The model was evaluated on a held-out test set from the skin disease dataset. The performance metrics are as follows: Accuracy: [To be filled after evaluation] Precision: [To be filled after evaluation] Recall: [To be filled after evaluation] F1 Score: [To be filled after evaluation] Limitations The model's performance may vary depending on the quality and resolution of the input images. It may not perform well on skin diseases not represented in the training dataset. The model's predictions should be used as a tool for medical professionals and not as a definitive diagnosis. Ethical Considerations This model should not be used for self-diagnosis or to replace professional medical advice. Ensure that the use of this model complies with privacy regulations regarding medical data. License This model is released under the [License Name] license. Please refer to the license file for more details. This model card provides a comprehensive guide on how to use the fine-tuned Qwen/Qwen2-VL-2B-Instruct model for skin disease classification. Make sure to fill in the performance metrics after evaluating the model on your test set.