Instructions to use vmal/med-advisor-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vmal/med-advisor-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="vmal/med-advisor-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vmal/med-advisor-4b")
model = AutoModelForCausalLM.from_pretrained("vmal/med-advisor-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use vmal/med-advisor-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vmal/med-advisor-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/vmal/med-advisor-4b

SGLang

How to use vmal/med-advisor-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vmal/med-advisor-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vmal/med-advisor-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use vmal/med-advisor-4b with Docker Model Runner:
```
docker model run hf.co/vmal/med-advisor-4b
```

med-advisor-4b

med-advisor-4b is a 4B-parameter chat model for medical and scientific education built on Qwen/Qwen3-4B-Base.

It is designed to explain concepts clearly, adapt to different audiences, and hold boundaries on diagnosis, medication dosing, and other personal medical decision-making. The current default release is the latest version of the model in this repository. Earlier checkpoints are listed later under Accessing Earlier Revisions.

This model is for general education, not clinical use. It should not be used to diagnose, prescribe, dose medications, interpret personal medical data as a clinical conclusion, or replace clinician judgment.

Capabilities

Explaining medical and scientific concepts in plain language
Adapting explanations to different audiences, including patients, students, caregivers, and healthcare workers
Answering educational questions while maintaining policy boundaries
Refusing high-risk requests more cleanly than the Phase 2 checkpoint

Out of Scope

This model should not be used for:

diagnosis
treatment planning
medication dosing or titration
interpreting an individual's labs, imaging, pathology, or symptoms as a clinical conclusion
emergency triage in place of local emergency services
deployment as a standalone medical advice system

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "vmal/med-advisor-4b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "You are an evidence-aware medical and scientific explainer. "
            "Give clear general education with appropriate uncertainty. "
            "Do not diagnose, prescribe, dose, interpret personal medical data "
            "as a clinical conclusion, or replace clinician judgment. "
            "Escalate emergencies immediately."
        ),
    },
    {
        "role": "user",
        "content": "Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.",
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
    return_dict=True,
).to(model.device)

outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=1024,
    do_sample=False,
    repetition_penalty=1.10,
    no_repeat_ngram_size=6,
)

print(tokenizer.decode(outputs[0, inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

The model uses the Qwen chat template. Boundary behavior is most reliable when the system prompt clearly states that the assistant is providing general education only.

Example Prompts and Outputs

Example 1: Evidence Quality

Question

What does it mean when people say correlation does not prove causation in nutrition studies? Give a practical example and explain how to think about evidence quality.

Answer

When people say "correlation does not prove causation," they're pointing out that just because two things happen together (are correlated) doesn't necessarily mean one causes the other. In nutrition studies, this is important because many factors can influence health outcomes at once.

For example: A study finds that people who eat more fruits and vegetables also tend to have lower rates of heart disease. This could be due to several reasons:

Fruits and veggies might actually reduce risk directly.

People eating healthier overall may avoid smoking, exercise more, etc., which lowers risk too.

The study design might miss confounding variables like income or genetics affecting both diet and outcome.

To think about evidence quality:

Look for randomized controlled trials (RCTs), where participants are randomly assigned to different diets, if possible.

Consider observational studies carefully; check whether researchers adjusted for known confounders.

Ask what the effect size is — small effects are harder to show than large ones.

Check publication bias by looking at meta-analyses combining multiple studies.

Be cautious about headlines promising specific foods cure diseases without strong supporting evidence.

In short, while correlations can suggest interesting patterns, establishing cause requires stronger designs and careful analysis.

Example 2: Audience Adaptation

Question

Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.

Answer

For a worried family member: Type 1 diabetes is when your body doesn't make insulin at all because it attacks its own cells that produce this important hormone. It's usually diagnosed in children and young adults but can happen anytime. People with type 1 need to take insulin every day through shots or pumps.

Type 2 diabetes happens when your body either makes too little insulin or becomes resistant to it. This often develops over time and is more common in older people who have been overweight or inactive. Lifestyle changes like diet and exercise can help manage it, sometimes along with medication if needed.

For a first-year medical student: In type 1 diabetes, there's an autoimmune destruction of pancreatic beta cells leading to absolute insulin deficiency; patients require lifelong exogenous insulin therapy. The onset is typically rapid and occurs before age 40 (though it can occur later).

In contrast, type 2 diabetes involves relative insulin resistance plus impaired insulin secretion from the pancreas. Risk factors include obesity, sedentary lifestyle, and genetic predisposition. Management starts with non-pharmacological interventions such as weight loss and increased physical activity, progressing to oral hypoglycemic agents or injectable medications if glycemia remains uncontrolled despite these measures.

Example 3: Emergency Escalation

Question

My dad has crushing chest pain, is sweating, and feels short of breath. What should I do right now?

Answer

I'm sorry to hear that your dad is experiencing these symptoms. Crushing chest pain, sweating, and shortness of breath can be signs of a heart attack or other serious conditions, so it's important to act quickly.

Please call emergency services (911 in the United States) immediately. If possible, have someone else call while you stay with him and follow their instructions. While waiting for help, try to keep him calm and sitting down if he can tolerate it; avoid giving him any food, drink, or medication unless instructed by emergency responders.

If he becomes unresponsive or stops breathing normally, begin CPR if you know how to perform it and use an automated external defibrillator (AED) if available. Remember, time is critical in situations like this, so get professional help as soon as possible.

Training Overview

This model was built in three stages:

Phase 1: Medical Instruction Tuning

Phase 1 trained the base model on a curated medical instruction set to establish medical-domain behavior and explanation quality.

Phase 2: Persona Refinement

Phase 2 refined the model for:

deeper explanations
better audience adaptation
improved hedging and evidence framing
warmer and more natural structure

Phase 3: Policy Alignment with DPO

Phase 3 applied full-model DPO on top of the Phase 2 checkpoint to improve:

refusal sharpness
redirect correctness
emergency escalation
adversarial boundary holding

Evaluation Summary

The checkpoint was evaluated on two suites: a frozen policy regression suite and a separate persona regression suite. Both evaluations used gpt-5-mini as judge with reasoning_effort=low.

Frozen Policy Regression

Phase 3 was compared against the Phase 2 checkpoint and the original Phase 1 checkpoint on the same frozen suite.

Model	Overall	Safety	Helpfulness	Medical Accuracy	Boundary Violation	Mode Incorrect
Phase 3	4.5449	4.7885	4.5406	4.6880	3.85%	4.91%
Phase 2	4.4850	4.7436	4.4744	4.6090	5.77%	6.84%
Phase 1	4.5064	4.7714	4.4573	4.6389	3.85%	4.70%

Relative to Phase 2, Phase 3 reduced the two key failure rates:

Boundary violation: 5.77% -> 3.85%
Mode incorrect: 6.84% -> 4.91%

Relative to Phase 1, Phase 3 is stronger on overall quality, helpfulness, and medical accuracy, while remaining slightly worse on mode correctness.

External Baselines

For external context, the same frozen regression suite was also run on the original Qwen base and instruct checkpoints:

Model	Overall	Boundary Violation	Mode Incorrect
Qwen3-4B-Base	3.66	26.24%	28.39%
Qwen3-4B-Instruct	4.04	27.31%	24.52%
med-advisor-4b Phase 3	4.54	3.85%	4.91%

This is the main reason to use med-advisor-4b instead of the off-the-shelf base or instruct model for medical education: the Phase 3 checkpoint is much better at holding medical policy boundaries while remaining useful as an explainer.

Persona Regression

Model	Overall	Depth	Audience	Warmth	Structure	Hedging	Verbosity	Evidence	Multi-turn
Phase 3	4.1042	3.7292	4.5208	4.2083	4.3958	4.2708	4.6875	3.7083	4.6667
Phase 2	4.0000	3.5833	4.3750	4.2292	4.3750	4.1458	4.6250	3.5208	5.0000
Phase 1	3.6458	3.4583	4.1458	4.0417	4.1250	3.6875	4.2500	3.1458	4.6667

Relative to Phase 2, Phase 3 is a net-positive persona update:

better overall persona quality
better depth
better audience adaptation
better structure and hedging
better verbosity control

Small regressions remain in:

warmth
multi-turn consistency

Safety and Limitations

This model is a medical education model, not a clinical system. It still has meaningful limitations:

like any language model, it can generate inaccurate, incomplete, outdated, or misleading information, including content that sounds confident
it can remain too shallow on some prompts
emergency escalation is improved but still a top weakness
some refusal answers can still leak guideline-like dosing content
repetition can appear on long, refusal-heavy generations if decoding is loose

Recommended decoding for safer, more stable output:

do_sample=False
repetition_penalty=1.10 to 1.15
no_repeat_ngram_size=6
a clear system prompt that restricts the model to general education

These settings reduced repetition in local testing, but they are not a substitute for external safety review.

Accessing Earlier Revisions

Earlier checkpoints remain available in this repository history:

Phase	Description	Revision
Phase 1	Medical checkpoint	`193afbea53c34b2bdc9c493411d10d94b58da486`
Phase 2	Persona-refined checkpoint	`285617171e95fd98983e231f8d69652dce50e964`
Phase 3	Current default checkpoint	`main`

License

Apache 2.0

Citation

If you use this checkpoint, please cite the repository and model page.

Downloads last month: 188

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for vmal/med-advisor-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

(286)

this model

Evaluation results

Boundary violation %
self-reported

3.850
Mode incorrect %
self-reported

4.910
Judge overall (1-5)
self-reported

4.540
Judge safety (1-5)
self-reported

4.790
Persona overall (1-5)
self-reported

4.100
Depth (1-5)
self-reported

3.730
Audience match (1-5)
self-reported

4.520
Boundary drift %
self-reported

2.080