Model Card: Gemma-3-270M BioInstruct LoRA (POC)

Model Details

Model Description

This model is a proof-of-concept fine-tune of Gemma-3 270M parameters on biomedical instruction data. It was fine-tuned using the bio-nlp-umass/bioinstruct dataset, reformatted into a chat-like structure (Instruction / Input / Answer) to align with instruction-following behavior.

Developed by: Kunj Shah
Model type: Decoder-only causal LM (LoRA fine-tuned)
Language(s): English (biomedical domain)
License: Apache-2.0 (inherits from base Gemma-3)
Base model: google/gemma-3-270M
Finetuning method: Parameter-efficient LoRA adapters (attention + MLP projections)
Status: Minimal proof of concept (not production-ready)

Model Sources

Repository: (fill with your HF repo link once pushed)
Demo / Endpoint: Served via vLLM for efficient inference

Uses

Direct Use

Biomedical text simplification
Summarization of clinical notes into lay terms
Identifying medications or clinical entities
General instruction-following on medical prompts

Downstream Use

Further fine-tuning on specialized biomedical tasks (NER, relation extraction, QA)
Integration into biomedical RAG (Retrieval-Augmented Generation) systems

Out-of-Scope Use

Production clinical decision support
Any diagnostic or therapeutic use without human oversight
General domain tasks outside biomedical text (not aligned for non-medical use)

Bias, Risks, and Limitations

Domain bias: Trained only on biomedical instructions; may hallucinate outside domain.
Not reliable for clinical care: Outputs must not be used for patient-facing decisions.
Small model size (270M): Limited reasoning and factual accuracy compared to larger LMs.

Recommendations

Use strictly for research and experimentation. Do not deploy in production medical settings. Pair with RAG or external validation for any downstream pipeline.

How to Get Started

Inference with Transformers + PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

model_name = "google/gemma-3-270m"
adapter_dir = "kunj/gemma3-270m-bioinstruct-lora"  # replace with your HF repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_dir)
model.eval()

prompt = "Instruction: Summarize this clinical note.\nInput: Patient with hypertension and diabetes admitted with dyspnea. Echocardiogram shows EF 30%.\nAnswer: "

enc = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**enc, max_new_tokens=128)
print(tokenizer.decode(out[0][enc['input_ids'].shape[1]:], skip_special_tokens=True))

Inference with vLLM

vllm serve kunj/gemma3-270m-bioinstruct-lora \
  --dtype bfloat16 \
  --max-model-len 2048

Then query the endpoint with your biomedical instruction.

Training Details

Training Data

Dataset: bio-nlp-umass/bioinstruct
Preprocessing: Reformatted into chat-style with explicit Instruction, Input, and Answer fields.

Training Procedure

Method: LoRA fine-tuning (attention + MLP projections)
Sequence length: 2048 (with packing)
Batching: 16 per device × 8 grad accumulation = effective 128 sequences/step
Epochs: 3
Optimizer: AdamW (fused), cosine LR schedule
Learning rate: 5e-5
Precision: bf16 mixed precision on A100
Gradient checkpointing: Enabled
Attention implementation: FlashAttention-2

Evaluation

This POC was not benchmarked on standard biomedical leaderboards. Sanity-checked on held-out examples from bioinstruct: shows coherent simplifications and medication extraction, but suffers from typical small-LM hallucinations.

Environmental Impact

Hardware: NVIDIA A100 40GB
Sequence length: 2048
Training epochs: 3

Technical Specifications

Architecture: Gemma-3 (decoder-only transformer, 270M parameters)
Objective: Causal LM loss with masked labels (user/system ignored, assistant supervised)
Compute Infrastructure: Single A100 GPU, Hugging Face Transformers + PEFT + FlashAttention-2

Model Card Contact

Author: Kunj Shah
Contact: Portfolio

Downloads last month: 2

Model tree for kunjcr2/gemma3_finetune

Base model

google/gemma-3-270m

Adapter

(28)

this model

kunjcr2
/

gemma3_finetune