File size: 2,927 Bytes
2f5a06f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a754405
2f5a06f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
829ba6a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
language: en
tags:
- qwen3
- medical
- chat
- fine-tuned
- gguf
- healthcare
datasets:
- custom-medical-qa
model_type: qwen3
base_model: Qwen/Qwen3-0.6B
---

# Qwen3-0.6B-Medical-Finetuned-v1

This model is a fine-tuned version of `Qwen/Qwen3-0.6B` specialized for medical question-answering. It's designed to provide helpful, accurate medical information while emphasizing the importance of professional medical consultation.

## ๐Ÿฅ Model Description

- **Base Model**: `Qwen/Qwen3-0.6B`
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: Custom medical Q&A dataset covering common health topics.
- **Training**: Optimized for conversational medical assistance.

## โš ๏ธ Important Disclaimer

**This model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare providers for medical concerns. Do not use this model for emergency situations - call emergency services immediately.**

## ๐Ÿš€ Usage

### With `transformers`

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Create a conversation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Create conversation
prompt = "<|im_start|>system\nYou are a helpful medical assistant providing accurate, evidence-based information.<|im_end|>\n<|im_start|>user\nWhat are the symptoms of hypertension?<|im_end|>\n<|im_start|>assistant\n"

# Generate response
response = pipe(prompt, max_new_tokens=300, temperature=0.7, top_p=0.9, do_sample=True)
print(response[0]["generated_text"])
```

## ๐Ÿ”ง GGUF Versions

This repository includes quantized GGUF versions for use with `llama.cpp` and compatible tools:

- `Qwen3-0.6B-Medical-Finetuned-v1.fp16.gguf` - Full precision (largest, best quality)
- `Qwen3-0.6B-Medical-Finetuned-v1.Q8_0.gguf` - 8-bit quantization (good balance)
- `Qwen3-0.6B-Medical-Finetuned-v1.Q5_K_M.gguf` - 5-bit quantization (smaller, fast)
- `Qwen3-0.6B-Medical-Finetuned-v1.Q4_K_M.gguf` - 4-bit quantization (smallest, fastest)

### Using with Ollama

```bash
# Pull the model (once available on the Hub)
ollama pull rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1

# Run the model
ollama run rohitnagareddy/Qwen3-0.6B-Medical-Finetuned-v1 "What are the early signs of diabetes?"
```

## ๐Ÿ“Š Training Details

- **Training Epochs**: 2
- **Batch Size**: 2 (with 4 steps of gradient accumulation)
- **Learning Rate**: 2e-4
- **Optimizer**: Paged AdamW 32-bit
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
- **Target Modules**: Auto-detected linear layers

---