DoctorHimel_V1
This model was fine-tuned using LoRA, (Low-Rank Adaptation) , a technique that introduces low-rank matrices to adapt pre-trained models to specific tasks
This is a fine-tuned version of the unsloth/gemma-2b-it-bnb-4bit
model specialized for answering medical and clinical questions.
It was fine-tuned using Hugging Face's TRL library and accelerated with Unsloth, allowing faster training and inference while maintaining low memory usage through 4-bit quantization.
The model was adapted using LoRA (Low-Rank Adaptation) , which enables efficient fine-tuning by updating only a small subset of trainable parameters. This approach drastically reduces VRAM consumption and speeds up training without sacrificing performance, making it ideal for resource-constrained environments like Google Colab.
Model Details
Model Description
This model is built on top of Google’s Gemma 2B instruction-tuned variant (gemma-2b-it
), further optimized with 4-bit quantization using bnb-nf4
to reduce memory consumption and improve inference speed on consumer hardware.
The model has been fine-tuned specifically for medical Q&A tasks and can assist with diagnostic reasoning, symptom analysis, treatment suggestions, and more.
- Developed by: Himel
- Finetuned from model: unsloth/gemma-2b-it-bnb-4bit
- Model type: Causal Language Model (Decoder-only)
- Language(s) (NLP): English
- License: Apache-2.0
- Quantization: 4-bit NF4 via BitsAndBytes
- Training Framework: TRL + Unsloth
Uses
Direct Use
Use this model for generating responses to medical questions, including:
- Diagnosing symptoms
- Explaining treatments
- Summarizing clinical findings
- Answering patient queries
Downstream Use
Can be used as a base for:
- Medical chatbots
- Educational tools for students
- Clinical decision support systems
Out-of-Scope Use
This model should not be used for:
- Final medical diagnosis without human oversight
- Emergency health advice
- Legal or binding decisions
Bias, Risks, and Limitations
As with any language model, there may be cases where:
- Responses are incorrect or misleading
- Biases in training data affect output
- Medical advice lacks nuance or context
Always verify critical information with trained professionals or authoritative sources.
Recommendations
Users should treat this model as an assistant, not a replacement for professional medical advice.
How to Get Started with the Model
Install Required Libraries
for colab
- Check python version
!python --version
- Clean cache
!pip cache purge
- install dependancy
!pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0
!pip install transformers accelerate
!pip install bitsandbytes
!pip install -U peft
!pip install huggingface_hub[hf_xet]
- check torch version
import torch
import torchvision
import torchaudio
print(f"Torch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Torchaudio version: {torchaudio.__version__}")
- Check if a CUDA device is available
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(0)}" if torch.cuda.is_available() else "No GPU found")
- Result Shape
# Create a tensor and move it to GPU
tensor = torch.randn(1000, 1000).cuda()
# Perform a matrix multiplication on the GPU
result = torch.matmul(tensor, tensor)
print(f"Result shape: {result.shape}")
- move model to GPU
import torch.nn as nn
import torch.optim as optim
# Sample neural network
model = nn.Sequential(
nn.Linear(1000, 500),
nn.ReLU(),
nn.Linear(500, 10)
)
# Move the model to the GPU
model = model.cuda()
# Sample input data (1000 samples, 1000 features)
inputs = torch.randn(1000, 1000).cuda()
# Forward pass
output = model(inputs)
print(output.shape)
- Load model
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "himel06/DoctorHimel_V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load model without adapters or LoRA configuration
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", low_cpu_mem_usage=True)
- Prompt template
prompt_template = """
Below is a medical question. Please provide a detailed and accurate response based on your knowledge.
### Question:
{}
### Answer:
"""
- Question template
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
what would cystometry most likely reveal about her residual volume and detrusor contractions?"""
input_text = prompt_template.format(question)
- Output template
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids=inputs["input_ids"],
max_new_tokens=400,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.replace(input_text, "").strip())
Model tree for himel06/DoctorHimel_V1
Base model
unsloth/gemma-2b-it-bnb-4bit