Model Card for HealthGPT-TinyLlama

This model is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 on a custom medical dataset. It was developed to serve as a lightweight, domain-specific assistant capable of answering medical questions fluently and coherently.

Model Details

Model Description

HealthGPT-TinyLlama is a 1.1B parameter model fine-tuned using LoRA adapters for the task of medical question answering. The base model is TinyLlama, a compact transformer architecture optimized for performance and efficiency.

  • Developed by: Selina Zarzour
  • Shared by: selinazarzour
  • Model type: Causal Language Model
  • Language(s): English
  • License: apache-2.0
  • Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources

Uses

Direct Use

  • Designed to answer general medical questions.
  • Intended for educational and experimental use.

Out-of-Scope Use

  • Not suitable for clinical decision-making or professional diagnosis.
  • Should not be relied on for life-critical use cases.

Bias, Risks, and Limitations

  • The model may hallucinate or provide medically inaccurate information.
  • It has not been validated against real-world clinical data.
  • Biases present in the training dataset may persist.

Recommendations

  • Always verify model outputs with qualified professionals.
  • Do not use in scenarios where safety is critical.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("selinazarzour/healthgpt-tinyllama")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

prompt = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

  • Finetuned on a synthetic dataset composed of medical questions and answers derived from reliable medical knowledge sources.

Training Procedure

  • LoRA adapter training using HuggingFace PEFT and transformers
  • Model merged with base weights after training

Training Hyperparameters

  • Precision: float16 mixed precision
  • Epochs: 3
  • Optimizer: AdamW
  • Batch size: 4

Evaluation

Testing Data, Factors & Metrics

  • Testing done manually by querying the model with unseen questions.
  • Sample outputs evaluated for relevance, grammar, and factual accuracy.

Results

  • The model produces relevant and coherent answers in most cases.
  • Model performs best on short, fact-based questions.

Model Examination

Screenshot of local Gradio app interface:

Note: The model was not deployed publicly due to GPU-only compatibility, but it runs successfully in local environments with GPU access.

image/png

Environmental Impact

  • Hardware Type: Google Colab GPU (T4/A100)
  • Hours used: ~3 hours
  • Cloud Provider: Google Cloud via Colab
  • Compute Region: US (unknown exact zone)
  • Carbon Emitted: Unknown

Technical Specifications

Model Architecture and Objective

  • LlamaForCausalLM with 22 layers, 32 attention heads, 2048 hidden size
  • LoRA finetuning applied to attention layers only

Compute Infrastructure

  • Hardware: Colab GPU

  • Software:

    • transformers 4.39+
    • peft
    • bitsandbytes (for initial quantized training)

Citation

APA: Zarzour, S. (2025). HealthGPT-TinyLlama: A fine-tuned 1.1B LLM for medical Q&A.

Model Card Contact

  • Contact: Selina Zarzour via Hugging Face (@selinazarzour)

Note: This model is a prototype and not intended for clinical use.

Downloads last month
1,098
Safetensors
Model size
631M params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for selinazarzour/healthgpt-tinyllama

Quantized
(110)
this model