bunyaminergen's picture
Initial
36ee84d
---
base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT
library_name: transformers
language:
- en
tags:
- code
- codeqwen
- chat
- qwen
- qwen-coder
license: gpl-3.0
datasets:
- bunyaminergen/Stable-Code-Python-SFT
pipeline_tag: text-generation
license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE
---
# Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled
The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model
down to 1B parameters using a token-based knowledge distillation method.
---
### TableofContents
- [Usage](#usage)
- [Dataset](#dataset)
- [Training](#training)
- [License](#licence)
- [Links](#links)
- [Team](#team)
- [Contact](#contact)
- [Citation](#citation)
---
### Usage
#### Hugging Face
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize = AutoTokenizer.from_pretrained(repo, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(
repo,
device_map="auto",
torch_dtype="auto",
).eval()
system = "You are a senior Python developer."
user = "Give me a Python implementation of bubble sort."
text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))
```
---
### Dataset
- [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT)
---
### Training
#### Hyperparameters
| Hyperparameter | Value |
|-------------------------------|-------------------------------------------------|
| Base Model | `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` |
| Knowledge Distillation Method | Token based |
| Task Type | `CAUSAL_LM` |
| Number of Epochs | `11` |
| Batch Size | `12` |
| Gradient Accumulation Steps | `2` |
| Effective Batch Size | `24` (12 × 2) |
| Learning Rate | `5e-5` |
| Optimizer | `AdamW` |
| Precision | `BF16 Mixed Precision` |
| Evaluation Strategy | `epoch` |
| Max Sequence Length | `256 tokens` |
| Logging Steps | every `epoch` steps |
| Save Checkpoint Steps | every `10000` steps |
| Experiment Tracking | `MLflow` (local) |
| Experiment Name | `StudentKnowledgeDistillation` |
| MLflow Run Name | `StudentKD` |
#### Knowledge Distillation Configuration
| Parameter | Value |
|---------------------|-------------|
| Distillation Weight | `0.3` |
| Temperature | `0.5` |
| Loss Reduction | `batchmean` |
#### Dataset
- **Train/Test Split:** `90%/10%`
- **Random Seed:** `42`
- **Train Batched:** `True`
- **Eval Batched:** `True`
#### Tokenizer Configuration
- **Truncation:** Enabled (`max_length=256`)
- **Masked Language Modeling (MLM):** `False`
#### Speeds, Sizes, Times
- **Total Training Time:** ~7 hours
- **Checkpoint Frequency:** every `10000` steps
- **Checkpoint Steps:**
- `checkpoint-10000`
- `checkpoint-13200` *(final checkpoint)*
#### Compute Infrastructure
**Hardware:**
- GPU: **1 × NVIDIA L40S (48 GB VRAM)**
- RAM: **94 GB**
- CPU: **16 vCPU**
**Software:**
- OS: **Ubuntu 22.04**
- Frameworks: **PyTorch 2.4.0**
- CUDA Version: **12.4.1**
---
### Licence
- [LICENSE](LICENSE)
---
### Links
- [Github](https://github.com/bunyaminergen/)
- [Website](https://bunyaminergen.com)
- [Linkedin](https://www.linkedin.com/in/bunyaminergen)
---
### Team
- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen)
---
### Contact
- [Mail](mailto:[email protected])
---
### Citation
```bibtex
@software{ Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
author = {Bunyamin Ergen},
title = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
year = {2025},
month = {04},
}
```
---