DoctorHimel_V1

This model was fine-tuned using LoRA, (Low-Rank Adaptation) , a technique that introduces low-rank matrices to adapt pre-trained models to specific tasks

This is a fine-tuned version of the unsloth/gemma-2b-it-bnb-4bit model specialized for answering medical and clinical questions.

It was fine-tuned using Hugging Face's TRL library and accelerated with Unsloth, allowing faster training and inference while maintaining low memory usage through 4-bit quantization.

The model was adapted using LoRA (Low-Rank Adaptation) , which enables efficient fine-tuning by updating only a small subset of trainable parameters. This approach drastically reduces VRAM consumption and speeds up training without sacrificing performance, making it ideal for resource-constrained environments like Google Colab.


Model Details

Model Description

This model is built on top of Google’s Gemma 2B instruction-tuned variant (gemma-2b-it), further optimized with 4-bit quantization using bnb-nf4 to reduce memory consumption and improve inference speed on consumer hardware.

The model has been fine-tuned specifically for medical Q&A tasks and can assist with diagnostic reasoning, symptom analysis, treatment suggestions, and more.

  • Developed by: Himel
  • Finetuned from model: unsloth/gemma-2b-it-bnb-4bit
  • Model type: Causal Language Model (Decoder-only)
  • Language(s) (NLP): English
  • License: Apache-2.0
  • Quantization: 4-bit NF4 via BitsAndBytes
  • Training Framework: TRL + Unsloth

Uses

Direct Use

Use this model for generating responses to medical questions, including:

  • Diagnosing symptoms
  • Explaining treatments
  • Summarizing clinical findings
  • Answering patient queries

Downstream Use

Can be used as a base for:

  • Medical chatbots
  • Educational tools for students
  • Clinical decision support systems

Out-of-Scope Use

This model should not be used for:

  • Final medical diagnosis without human oversight
  • Emergency health advice
  • Legal or binding decisions

Bias, Risks, and Limitations

As with any language model, there may be cases where:

  • Responses are incorrect or misleading
  • Biases in training data affect output
  • Medical advice lacks nuance or context

Always verify critical information with trained professionals or authoritative sources.

Recommendations

Users should treat this model as an assistant, not a replacement for professional medical advice.


How to Get Started with the Model

Install Required Libraries

for colab

  1. Check python version
!python --version
  1. Clean cache
!pip cache purge
  1. install dependancy
!pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
!pip install transformers accelerate
!pip install bitsandbytes
!pip install -U peft
!pip install huggingface_hub[hf_xet]
  1. check torch version
import torch
import torchvision
import torchaudio

print(f"Torch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Torchaudio version: {torchaudio.__version__}")
  1. Check if a CUDA device is available
import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(0)}" if torch.cuda.is_available() else "No GPU found")
  1. Result Shape
# Create a tensor and move it to GPU
tensor = torch.randn(1000, 1000).cuda()

# Perform a matrix multiplication on the GPU
result = torch.matmul(tensor, tensor)

print(f"Result shape: {result.shape}")
  1. move model to GPU
import torch.nn as nn
import torch.optim as optim

# Sample neural network
model = nn.Sequential(
    nn.Linear(1000, 500),
    nn.ReLU(),
    nn.Linear(500, 10)
)

# Move the model to the GPU
model = model.cuda()

# Sample input data (1000 samples, 1000 features)
inputs = torch.randn(1000, 1000).cuda()

# Forward pass
output = model(inputs)
print(output.shape)
  1. Load model
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "himel06/DoctorHimel_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model without adapters or LoRA configuration
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", low_cpu_mem_usage=True)
  1. Prompt template
prompt_template = """
Below is a medical question. Please provide a detailed and accurate response based on your knowledge.

### Question:
{}

### Answer:
"""
  1. Question template
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

input_text = prompt_template.format(question)
  1. Output template
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
    input_ids=inputs["input_ids"],
    max_new_tokens=400,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.replace(input_text, "").strip())
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for himel06/DoctorHimel_V1

Finetuned
(101)
this model