File size: 4,820 Bytes
9f2a35f
 
36ee84d
9f2a35f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT
library_name: transformers
language:
- en
tags:
- code
- codeqwen
- chat
- qwen
- qwen-coder
license: gpl-3.0
datasets:
- bunyaminergen/Stable-Code-Python-SFT
pipeline_tag: text-generation
license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE
---

# Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled

The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model
down to 1B parameters using a token-based knowledge distillation method.

---

### TableofContents

- [Usage](#usage)
- [Dataset](#dataset)
- [Training](#training)
- [License](#licence)
- [Links](#links)
- [Team](#team)
- [Contact](#contact)
- [Citation](#citation)

---

### Usage

#### Hugging Face

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize  = AutoTokenizer.from_pretrained(repo, padding_side="left")
model  = AutoModelForCausalLM.from_pretrained(
          repo,
          device_map="auto",
          torch_dtype="auto",
      ).eval()

system = "You are a senior Python developer."
user   = "Give me a Python implementation of bubble sort."

text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))
```

---

### Dataset

- [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT)

---

### Training

#### Hyperparameters

| Hyperparameter                | Value                                           |
|-------------------------------|-------------------------------------------------|
| Base Model                    | `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` |
| Knowledge Distillation Method | Token based                                     |
| Task Type                     | `CAUSAL_LM`                                     |
| Number of Epochs              | `11`                                            |
| Batch Size                    | `12`                                            |
| Gradient Accumulation Steps   | `2`                                             |
| Effective Batch Size          | `24` (12 × 2)                                   |
| Learning Rate                 | `5e-5`                                          |
| Optimizer                     | `AdamW`                                         |
| Precision                     | `BF16 Mixed Precision`                          |
| Evaluation Strategy           | `epoch`                                         |
| Max Sequence Length           | `256 tokens`                                    |
| Logging Steps                 | every `epoch` steps                             |
| Save Checkpoint Steps         | every `10000` steps                             |
| Experiment Tracking           | `MLflow` (local)                                |
| Experiment Name               | `StudentKnowledgeDistillation`                  |
| MLflow Run Name               | `StudentKD`                                     |

#### Knowledge Distillation Configuration

| Parameter           | Value       |
|---------------------|-------------|
| Distillation Weight | `0.3`       |
| Temperature         | `0.5`       |
| Loss Reduction      | `batchmean` |

#### Dataset

- **Train/Test Split:** `90%/10%`
- **Random Seed:** `42`
- **Train Batched:** `True`
- **Eval Batched:** `True`

#### Tokenizer Configuration

- **Truncation:** Enabled (`max_length=256`)
- **Masked Language Modeling (MLM):** `False`

#### Speeds, Sizes, Times

- **Total Training Time:** ~7 hours
- **Checkpoint Frequency:** every `10000` steps
- **Checkpoint Steps:**
    - `checkpoint-10000`
    - `checkpoint-13200` *(final checkpoint)*

#### Compute Infrastructure

**Hardware:**

- GPU: **1 × NVIDIA L40S (48 GB VRAM)**
- RAM: **94 GB**
- CPU: **16 vCPU**

**Software:**

- OS: **Ubuntu 22.04**
- Frameworks: **PyTorch 2.4.0**
- CUDA Version: **12.4.1**

---

### Licence

- [LICENSE](LICENSE)

---

### Links

- [Github](https://github.com/bunyaminergen/)
- [Website](https://bunyaminergen.com)
- [Linkedin](https://www.linkedin.com/in/bunyaminergen)

---

### Team

- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen)

---

### Contact

- [Mail](mailto:[email protected])

---

### Citation

```bibtex
@software{       Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
  author       = {Bunyamin Ergen},
  title        = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
  year         = {2025},
  month        = {04},
}
```

---