DoctorHimel_V1

This model was fine-tuned using LoRA, (Low-Rank Adaptation) , a technique that introduces low-rank matrices to adapt pre-trained models to specific tasks

This is a fine-tuned version of the unsloth/gemma-2b-it-bnb-4bit model specialized for answering medical and clinical questions.

It was fine-tuned using Hugging Face's TRL library and accelerated with Unsloth, allowing faster training and inference while maintaining low memory usage through 4-bit quantization.

The model was adapted using LoRA (Low-Rank Adaptation) , which enables efficient fine-tuning by updating only a small subset of trainable parameters. This approach drastically reduces VRAM consumption and speeds up training without sacrificing performance, making it ideal for resource-constrained environments like Google Colab.

Model Details

Model Description

This model is built on top of Google’s Gemma 2B instruction-tuned variant (gemma-2b-it), further optimized with 4-bit quantization using bnb-nf4 to reduce memory consumption and improve inference speed on consumer hardware.

The model has been fine-tuned specifically for medical Q&A tasks and can assist with diagnostic reasoning, symptom analysis, treatment suggestions, and more.

Developed by: Himel
Finetuned from model: unsloth/gemma-2b-it-bnb-4bit
Model type: Causal Language Model (Decoder-only)
Language(s) (NLP): English
License: Apache-2.0
Quantization: 4-bit NF4 via BitsAndBytes
Training Framework: TRL + Unsloth

Uses

Direct Use

Use this model for generating responses to medical questions, including:

Diagnosing symptoms
Explaining treatments
Summarizing clinical findings
Answering patient queries

Downstream Use

Can be used as a base for:

Medical chatbots
Educational tools for students
Clinical decision support systems

Out-of-Scope Use

This model should not be used for:

Final medical diagnosis without human oversight
Emergency health advice
Legal or binding decisions

Bias, Risks, and Limitations

As with any language model, there may be cases where:

Responses are incorrect or misleading
Biases in training data affect output
Medical advice lacks nuance or context

Always verify critical information with trained professionals or authoritative sources.

Recommendations

Users should treat this model as an assistant, not a replacement for professional medical advice.

How to Get Started with the Model

Install Required Libraries

for colab

Check python version

!python --version

Clean cache

!pip cache purge

install dependancy

!pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
!pip install transformers accelerate
!pip install bitsandbytes
!pip install -U peft
!pip install huggingface_hub[hf_xet]

check torch version

import torch
import torchvision
import torchaudio

print(f"Torch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Torchaudio version: {torchaudio.__version__}")

Check if a CUDA device is available

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(0)}" if torch.cuda.is_available() else "No GPU found")

Result Shape

# Create a tensor and move it to GPU
tensor = torch.randn(1000, 1000).cuda()

# Perform a matrix multiplication on the GPU
result = torch.matmul(tensor, tensor)

print(f"Result shape: {result.shape}")

move model to GPU

import torch.nn as nn
import torch.optim as optim

# Sample neural network
model = nn.Sequential(
    nn.Linear(1000, 500),
    nn.ReLU(),
    nn.Linear(500, 10)
)

# Move the model to the GPU
model = model.cuda()

# Sample input data (1000 samples, 1000 features)
inputs = torch.randn(1000, 1000).cuda()

# Forward pass
output = model(inputs)
print(output.shape)

Load model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "himel06/DoctorHimel_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model without adapters or LoRA configuration
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", low_cpu_mem_usage=True)

Prompt template

prompt_template = """
Below is a medical question. Please provide a detailed and accurate response based on your knowledge.

### Question:
{}

### Answer:
"""

Question template

question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

input_text = prompt_template.format(question)

Output template

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
    input_ids=inputs["input_ids"],
    max_new_tokens=400,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.replace(input_text, "").strip())

himel06
/

DoctorHimel_V1