YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Asset from the SCALEMED Framework

This model/dataset is an asset released as part of the SCALEMED framework, a project focused on developing scalable and resource-efficient medical AI assistants.

Project Overview

The models, known as DermatoLlama, were trained on versions of the DermaSynth dataset, which was also generated using the SCALEMED pipeline.

For a complete overview of the project, including all related models, datasets, and the source code, please visit our main Hugging Face organization page: https://huggingface.co/DermaVLM

Usage

from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image

# Load base model
base_model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct" 
model = MllamaForConditionalGeneration.from_pretrained(base_model_name)
processor = AutoProcessor.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_path = "DermaVLM/DermatoLLama-10k"
model = PeftModel.from_pretrained(model, adapter_path)

# Inference
image_path = "DERM12345.jpg"
image = Image.open(image_path).convert("RGB")
prompt_text = "Describe the image in detail."
messages = []
content_list = []
if image:
    content_list.append({"type": "image"})

# Add the text part of the prompt
content_list.append({"type": "text", "text": prompt_text})
messages.append({"role": "user", "content": content_list})

input_text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False,
)

# Prepare final inputs
inputs = processor(
    images=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

generation_config = {
    "max_new_tokens": 256,
    "do_sample": True,
    "temperature": 0.4,
    "top_p": 0.95,
}

input_length = inputs.input_ids.shape[1]

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        **generation_config,
        pad_token_id=(
            processor.tokenizer.pad_token_id
            if processor.tokenizer.pad_token_id is not None
            else processor.tokenizer.eos_token_id
        ),
    )
    generated_tokens = outputs[0][input_length:]
    raw_output = processor.decode(generated_tokens, skip_special_tokens=True)

print(raw_output)

Citation

If you use this model, dataset, or any other asset from our work in your research, we kindly ask that you please cite our preprint:

@article {Yilmaz2025-DermatoLlama-VLM,
    author = {Yilmaz, Abdurrahim and Yuceyalcin, Furkan and Varol, Rahmetullah and Gokyayla, Ece and Erdem, Ozan and Choi, Donghee and Demircali, Ali Anil and Gencoglan, Gulsum and Posma, Joram M. and Temelkuran, Burak},
    title = {Resource-efficient medical vision language model for dermatology via a synthetic data generation framework},
    year = {2025},
    doi = {10.1101/2025.05.17.25327785},
    url = {https://www.medrxiv.org/content/early/2025/07/30/2025.05.17.25327785},
    journal = {medRxiv}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support