Qwen3-0.6B-Medical-Finetuned-v1

This model is a fine-tuned version of Qwen/Qwen3-0.6B specialized for medical question-answering. It's designed to provide helpful, accurate medical information while emphasizing the importance of professional medical consultation.

πŸ₯ Model Description

  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Custom medical Q&A dataset covering common health topics.
  • Training: Optimized for conversational medical assistance.

⚠️ Important Disclaimer

This model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare providers for medical concerns. Do not use this model for emergency situations - call emergency services immediately.

πŸš€ Usage

With transformers

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Create a conversation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Create conversation
prompt = "<|im_start|>system\nYou are a helpful medical assistant providing accurate, evidence-based information.<|im_end|>\n<|im_start|>user\nWhat are the symptoms of hypertension?<|im_end|>\n<|im_start|>assistant\n"

# Generate response
response = pipe(prompt, max_new_tokens=300, temperature=0.7, top_p=0.9, do_sample=True)
print(response[0]["generated_text"])

πŸ”§ GGUF Versions

This repository includes quantized GGUF versions for use with llama.cpp and compatible tools:

  • Qwen3-0.6B-Medical-Finetuned-v1.fp16.gguf - Full precision (largest, best quality)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q8_0.gguf - 8-bit quantization (good balance)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q5_K_M.gguf - 5-bit quantization (smaller, fast)
  • Qwen3-0.6B-Medical-Finetuned-v1.Q4_K_M.gguf - 4-bit quantization (smallest, fastest)

Using with Ollama

# Pull the model (once available on the Hub)
ollama pull rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1

# Run the model
ollama run rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1 "What are the early signs of diabetes?"

πŸ“Š Training Details

  • Training Epochs: 2
  • Batch Size: 2 (with 4 steps of gradient accumulation)
  • Learning Rate: 2e-4
  • Optimizer: Paged AdamW 32-bit
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: Auto-detected linear layers

Model created by rohitnagareddy using this automated Colab script.

Downloads last month
27
Safetensors
Model size
596M params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(123)
this model