Model Card: Gemma-3-270M BioInstruct LoRA (POC)
Model Details
Model Description
This model is a proof-of-concept fine-tune of Gemma-3 270M parameters on biomedical instruction data.
It was fine-tuned using the bio-nlp-umass/bioinstruct dataset, reformatted into a chat-like structure (Instruction
/ Input
/ Answer
) to align with instruction-following behavior.
- Developed by: Kunj Shah
- Model type: Decoder-only causal LM (LoRA fine-tuned)
- Language(s): English (biomedical domain)
- License: Apache-2.0 (inherits from base Gemma-3)
- Base model:
google/gemma-3-270M
- Finetuning method: Parameter-efficient LoRA adapters (attention + MLP projections)
- Status: Minimal proof of concept (not production-ready)
Model Sources
- Repository: (fill with your HF repo link once pushed)
- Demo / Endpoint: Served via vLLM for efficient inference
Uses
Direct Use
- Biomedical text simplification
- Summarization of clinical notes into lay terms
- Identifying medications or clinical entities
- General instruction-following on medical prompts
Downstream Use
- Further fine-tuning on specialized biomedical tasks (NER, relation extraction, QA)
- Integration into biomedical RAG (Retrieval-Augmented Generation) systems
Out-of-Scope Use
- Production clinical decision support
- Any diagnostic or therapeutic use without human oversight
- General domain tasks outside biomedical text (not aligned for non-medical use)
Bias, Risks, and Limitations
- Domain bias: Trained only on biomedical instructions; may hallucinate outside domain.
- Not reliable for clinical care: Outputs must not be used for patient-facing decisions.
- Small model size (270M): Limited reasoning and factual accuracy compared to larger LMs.
Recommendations
Use strictly for research and experimentation. Do not deploy in production medical settings. Pair with RAG or external validation for any downstream pipeline.
How to Get Started
Inference with Transformers + PEFT
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
model_name = "google/gemma-3-270m"
adapter_dir = "kunj/gemma3-270m-bioinstruct-lora" # replace with your HF repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
base = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_dir)
model.eval()
prompt = "Instruction: Summarize this clinical note.\nInput: Patient with hypertension and diabetes admitted with dyspnea. Echocardiogram shows EF 30%.\nAnswer: "
enc = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**enc, max_new_tokens=128)
print(tokenizer.decode(out[0][enc['input_ids'].shape[1]:], skip_special_tokens=True))
Inference with vLLM
vllm serve kunj/gemma3-270m-bioinstruct-lora \
--dtype bfloat16 \
--max-model-len 2048
Then query the endpoint with your biomedical instruction.
Training Details
Training Data
- Dataset: bio-nlp-umass/bioinstruct
- Preprocessing: Reformatted into chat-style with explicit
Instruction
,Input
, andAnswer
fields.
Training Procedure
- Method: LoRA fine-tuning (attention + MLP projections)
- Sequence length: 2048 (with packing)
- Batching: 16 per device × 8 grad accumulation = effective 128 sequences/step
- Epochs: 3
- Optimizer: AdamW (fused), cosine LR schedule
- Learning rate: 5e-5
- Precision: bf16 mixed precision on A100
- Gradient checkpointing: Enabled
- Attention implementation: FlashAttention-2
Evaluation
This POC was not benchmarked on standard biomedical leaderboards. Sanity-checked on held-out examples from bioinstruct: shows coherent simplifications and medication extraction, but suffers from typical small-LM hallucinations.
Environmental Impact
- Hardware: NVIDIA A100 40GB
- Sequence length: 2048
- Training epochs: 3
Technical Specifications
- Architecture: Gemma-3 (decoder-only transformer, 270M parameters)
- Objective: Causal LM loss with masked labels (user/system ignored, assistant supervised)
- Compute Infrastructure: Single A100 GPU, Hugging Face Transformers + PEFT + FlashAttention-2
Model Card Contact
- Author: Kunj Shah
- Contact: Portfolio
- Downloads last month
- 11
Model tree for kunjcr2/gemma3_finetune
Base model
google/gemma-3-270m