Qwen3-0.6B-Medical-Finetuned-v1

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for medical question-answering. It's designed to provide helpful, accurate medical information while emphasizing the importance of professional medical consultation.

πŸ₯ Model Description

  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Custom medical Q&A dataset covering common health topics.
  • Training: Optimized for conversational medical assistance.

⚠️ Important Disclaimer

This model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare providers for medical concerns. Do not use this model for emergency situations - call emergency services immediately.

πŸš€ Usage

With transformers

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Create a conversation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Create conversation
prompt = "<|im_start|>system\nYou are a helpful medical assistant providing accurate, evidence-based information.<|im_end|>\n<|im_start|>user\nWhat are the symptoms of hypertension?<|im_end|>\n<|im_start|>assistant\n"

# Generate response
response = pipe(prompt, max_new_tokens=300, temperature=0.7, top_p=0.9, do_sample=True)
print(response[0]["generated_text"])

πŸ”§ GGUF Versions

This repository includes quantized GGUF versions for use with llama.cpp and compatible tools:

  • Qwen3-0.6B-Medical-Finetuned-v1.fp16.gguf - Full precision (largest, best quality)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q8_0.gguf - 8-bit quantization (good balance)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q5_K_M.gguf - 5-bit quantization (smaller, fast)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q4_K_M.gguf - 4-bit quantization (smallest, fastest)

Using with Ollama

# Pull the model (once available on the Hub)
ollama pull rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1

# Run the model
ollama run rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1 "What are the early signs of diabetes?"

πŸ“Š Training Details

  • Training Epochs: 2
  • Batch Size: 2 (with 4 steps of gradient accumulation)
  • Learning Rate: 2e-4
  • Optimizer: Paged AdamW 32-bit
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: Auto-detected linear layers

Downloads last month
14
Safetensors
Model size
596M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(214)
this model