|
--- |
|
base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT |
|
library_name: transformers |
|
language: |
|
- en |
|
tags: |
|
- code |
|
- codeqwen |
|
- chat |
|
- qwen |
|
- qwen-coder |
|
license: gpl-3.0 |
|
datasets: |
|
- bunyaminergen/Stable-Code-Python-SFT |
|
pipeline_tag: text-generation |
|
license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE |
|
--- |
|
|
|
# Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled |
|
|
|
The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model |
|
down to 1B parameters using a token-based knowledge distillation method. |
|
|
|
--- |
|
|
|
### TableofContents |
|
|
|
- [Usage](#usage) |
|
- [Dataset](#dataset) |
|
- [Training](#training) |
|
- [License](#licence) |
|
- [Links](#links) |
|
- [Team](#team) |
|
- [Contact](#contact) |
|
- [Citation](#citation) |
|
|
|
--- |
|
|
|
### Usage |
|
|
|
#### Hugging Face |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled" |
|
tokenize = AutoTokenizer.from_pretrained(repo, padding_side="left") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
repo, |
|
device_map="auto", |
|
torch_dtype="auto", |
|
).eval() |
|
|
|
system = "You are a senior Python developer." |
|
user = "Give me a Python implementation of bubble sort." |
|
|
|
text = f"System: {system}\nUser: {user}\nAssistant:" |
|
inputs = tokenize(text, return_tensors="pt").to(model.device) |
|
|
|
with torch.no_grad(): |
|
out_ids = model.generate(**inputs, max_new_tokens=512) |
|
print(tokenize.decode(out_ids[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
### Dataset |
|
|
|
- [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT) |
|
|
|
--- |
|
|
|
### Training |
|
|
|
#### Hyperparameters |
|
|
|
| Hyperparameter | Value | |
|
|-------------------------------|-------------------------------------------------| |
|
| Base Model | `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` | |
|
| Knowledge Distillation Method | Token based | |
|
| Task Type | `CAUSAL_LM` | |
|
| Number of Epochs | `11` | |
|
| Batch Size | `12` | |
|
| Gradient Accumulation Steps | `2` | |
|
| Effective Batch Size | `24` (12 × 2) | |
|
| Learning Rate | `5e-5` | |
|
| Optimizer | `AdamW` | |
|
| Precision | `BF16 Mixed Precision` | |
|
| Evaluation Strategy | `epoch` | |
|
| Max Sequence Length | `256 tokens` | |
|
| Logging Steps | every `epoch` steps | |
|
| Save Checkpoint Steps | every `10000` steps | |
|
| Experiment Tracking | `MLflow` (local) | |
|
| Experiment Name | `StudentKnowledgeDistillation` | |
|
| MLflow Run Name | `StudentKD` | |
|
|
|
#### Knowledge Distillation Configuration |
|
|
|
| Parameter | Value | |
|
|---------------------|-------------| |
|
| Distillation Weight | `0.3` | |
|
| Temperature | `0.5` | |
|
| Loss Reduction | `batchmean` | |
|
|
|
#### Dataset |
|
|
|
- **Train/Test Split:** `90%/10%` |
|
- **Random Seed:** `42` |
|
- **Train Batched:** `True` |
|
- **Eval Batched:** `True` |
|
|
|
#### Tokenizer Configuration |
|
|
|
- **Truncation:** Enabled (`max_length=256`) |
|
- **Masked Language Modeling (MLM):** `False` |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
- **Total Training Time:** ~7 hours |
|
- **Checkpoint Frequency:** every `10000` steps |
|
- **Checkpoint Steps:** |
|
- `checkpoint-10000` |
|
- `checkpoint-13200` *(final checkpoint)* |
|
|
|
#### Compute Infrastructure |
|
|
|
**Hardware:** |
|
|
|
- GPU: **1 × NVIDIA L40S (48 GB VRAM)** |
|
- RAM: **94 GB** |
|
- CPU: **16 vCPU** |
|
|
|
**Software:** |
|
|
|
- OS: **Ubuntu 22.04** |
|
- Frameworks: **PyTorch 2.4.0** |
|
- CUDA Version: **12.4.1** |
|
|
|
--- |
|
|
|
### Licence |
|
|
|
- [LICENSE](LICENSE) |
|
|
|
--- |
|
|
|
### Links |
|
|
|
- [Github](https://github.com/bunyaminergen/) |
|
- [Website](https://bunyaminergen.com) |
|
- [Linkedin](https://www.linkedin.com/in/bunyaminergen) |
|
|
|
--- |
|
|
|
### Team |
|
|
|
- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen) |
|
|
|
--- |
|
|
|
### Contact |
|
|
|
- [Mail](mailto:[email protected]) |
|
|
|
--- |
|
|
|
### Citation |
|
|
|
```bibtex |
|
@software{ Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled, |
|
author = {Bunyamin Ergen}, |
|
title = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}}, |
|
year = {2025}, |
|
month = {04}, |
|
} |
|
``` |
|
|
|
--- |
|
|